IEA International Computer and 
Information Literacy Study 2018 


TECHNICAL REPORT 


Edited by 
Julian Fraillon 
John Ainley 
Wolfram Schulz 
Tim Friedman 
Daniel Duckworth 


3° 


IEA International Computer and 
Information Literacy Study 2018 


Technical Report 


Editors 


Julian Fraillon John Ainley 
Wolfram Schulz Tim Friedman 
Daniel Duckworth 


IEA International Computer and 
Information Literacy Study 2018 


Technical Report 


Contributors 

John Ainley Sebastian Meyer 
Ralph Carstens Ekaterina Mikheeva 
Alex Daraganov Lauren Musu 
Sandra Dohr Louise Ockwell 
David Ebbs Wolfram Schulz 
Julian Fraillon Sabine Tieck 


Tim Friedman 


Julian Fraillon Tim Friedman 

The Australian Council for Educational Research The Australian Council for Educational Research 
Camberwell, Victoria Camberwell, Victoria 

Australia Australia 

John Ainley Daniel Duckworth 

The Australian Council for Educational Research The Australian Council for Educational Research 
Camberwell, Victoria Camberwell, Victoria 

Australia Australia 

Wolfram Schulz 

The Australian Council for Educational Research 

Camberwell, Victoria 


Australia 


IEA 

Keizersgracht 311 

1016 EE Amsterdam 

The Netherlands 

Telephone: +31 20 625 3625 
Fax: + 31 20 420 7136 
Email: secretariat@iea.nl 
Website: www.iea.nl 


ISBN/EAN: 9789079549351 


© International Association for the Evaluation of Educational Achievement (IEA) 2020 


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted 
in any form or by any means, electronic, electrostatic, magnetic tape, mechanical, photocopying, recording or 
otherwise without permission in writing from the copyright holder. 


MF) The International Association for the Evaluation of Educational Achievement (IEA), with 
a F | EA headquarters in Amsterdam, is an independent, international cooperative of national 
>. — research institutions and governmental research agencies. It conducts large-scale 
Researching education, improving leaning 9 COMPparative studies of educational achievement and other aspects of education, with 
the aim of gaining in-depth understanding of the effects of policies and practices within 
and across systems of education. 


Design by Becky Bliss Design and Production, Wellington, New Zealand 
Cover design by Studio Lakmoes, Arnhem, The Netherlands 


Contents 


List of tables and figures 


Chapter 1: Overview of the IEA International Computer and Information Literacy 


Study 2018 

ntroduction 

nstruments 

Computer-based test delivery 
easures 

Data 

Outline of the technical report 
References 


Chapter 2: ICILS 2018 test development 
ntroduction 
Test scope and format 
Test-development process 
Field trial test design and content 

ain survey test design and content 
Released CIL test module and CT tasks 
References 


Chapter 3: Computer-based assessment systems 

ntroduction 

CILS 2018 computer-based components and architecture 
Developing the computer-based delivery platform for ICILS 2018 
Challenges with computer-based delivery in ICILS 2018 


Chapter 4: ICILS 2018 questionnaire development 
ntroduction 


The conceptual framework used to guide questionnaire development 
Development of the ICILS context questionnaires 

Development of the student questionnaire 

Development of the teacher questionnaire 

Development of the school principal and ICT coordinator questionnaires 
Development and implementation of the national contexts survey 
References 


Chapter 5: Instrument preparation and verification of the ICILS 2018 study instruments 


ntroduction 

Adaptation and translation of the ICILS study instruments 
nternational verification processes 

Summary 


Chapter 6: Sampling design and implementation 
Introduction 
Target population definitions 
Coverage and exclusions 
School sampling design 
Within-school sampling design 
Sample size requirements 
Efficiency of the ICILS 2018 sample design 
References 


39 


A 


f 
OO NOK WNE 


non 
N ff O 


NNN GOOG WWW 
CO NM O Oo WM +-F SO SO SO 


ICILS 2018 TECHNICAL REPORT 


Chapter 7: Sampling weights, non-response adjustments, and participation rates 79 
Introduction 79 
Types of sampling weights 79 
Calculating student weights 80 
Calculating teacher weights 82 
Calculating school weights 83 
Calculating participation rates 84 
ICILS 2018 standards for sampling participation 89 
References 93 

Chapter 8: ICILS 2018 field operations 95 

ntroduction 95 
Field operations personnel 95 
Field operations resources 96 
Field operations processes 98 
Field trial procedures 106 
Summary 107 

Chapter 9: Quality assurance procedures for ICILS 2018 109 
Introduction 109 
International quality control program 109 
Survey activities questionnaire 116 
Summary 120 

Chapter 10: Data management and creation of the ICILS 2018 database 121 
Introduction 121 
Data sources 121 
Confirming the integrity of the national databases 124 
The ICILS 2018 international database 131 
Summary 132 
References 132 

Chapter 11: Scaling procedures for ICILS 2018 test items 133 
Introduction 133 
The scaling model 133 
Test coverage and item dimensionality 134 
Assessment of item fit 136 
Dimensionality and local dependence 140 
Assessment of scorer reliabilities 141 
Differential item functioning by gender 144 

ational reports with item statistics 146 
Cross-national measurement equivalence 146 
Evaluating the impact of missing responses 148 

nternational item calibration and test reliability 150 
ClL and CT estimates 52 
Equating ClL scores from ICILS 2013 and 2018 154 
The development of proficiency levels for CIL 156 
References 158 


CONTENTS 


Chapter 12: Scaling procedures for ICILS 2018 questionnaire items 
Introduction 
Simple indices 
Scaling procedures 

tem response modeling 

Describing questionnaire scale indices 

Scaled indices 

References 


Chapter 13: The reporting of ICILS 2018 results 
Overview 
Estimation of sampling variance 
Estimation of imputation variance for CIL and CT scores 
Reporting of differences 
ultiple regression modeling of teacher data 


References 


Appendices 
Appendix A: Organizations and individuals involved in ICILS 2018 
Appendix B: Characteristics of national samples 
Appendix C: National items excluded from scaling 
Appendix D: Student background variables used for conditioning 


Hierarchical linear modeling to explain variation in students’ CIL and CT 


159 
159 
159 
164 
166 
168 
170 
218 
221 
221 
221 
224 
225 
228 
230 
233 


235 
239 
257 
259 


List of tables and figures 


Tables 
Table 2.1: Test development processes and timeline 
Table 2.2: Summary of ICILS 2018 CIL test modules and large tasks 
Table 2.3: Field trial CIL test module composition by items derived from tasks 
Table 2.4: Field trial CT test module composition by items derived from tasks 
Table 2.5: Field trial test form design and contents 
Table 2.6: Field trial CIL item mapping to the CIL framework 
able 2.7: Field trial CT item mapping to the CT framework 
Table 2.8: ain survey CIL test module compositio 
Table 2.9: ain survey CT test module composition by items derived from tasks 
Table 2.10: ain survey test form design and contents 
Table 2.11: ain survey CIL item mapping to the CIL framework 
Table 2.12: ain survey CT item mapping to the CT framework 
Table 2.13: Example scoring of the correctness of student coding responses 
Table 2.14: Example scoring of the efficiency of student coding responses 
Table 2.15: Example scoring of correctness and efficiency combined 
lable 3.1: CILS computer-based or computer-supported systems and operations 
Table 3.2: Unweighted participation status percentages across participants based on full 
database and original coding by test administrators (reference: all sampled 
students) 
Table 4.1: |= Mapping of variables to contextual framework with examples 
Table 5.1: Languages used for the ICILS 2018 study instruments 
Table 6.1: Percentages of schools excluded from th 
Table 6.2: Percentages of students excluded from t 
Table 6.4: School, student, and teacher sample sizes 
Table 6.5: Design effects of main outcome variables—Student survey 
Table 6.6: Design effects and effective samples sizes of mean of scale scores and 
plausible values—Student survey 
Table 6.7: Design effects of main outcome variables—Teacher survey 
Table 7.1: Unweighted school and student participation rates—Student survey 
Table 7.2: | Weighted school and student participation rates—Student survey 
Table 7.3: Unweighted school and teacher participation rates—Teacher survey 
Table 7.4: | Weighted school and teacher participation rates - Teacher survey 
Table 7.5: Achieved participation categories by country 
Table 8.1: — Hierarchical identification codes 
Table 8.2: — Timing of the ICILS assessment 
Table 9.1: Preparation of the testing room 
Table 9.2: Test administrator adherence to the administration script 
Table 9.3: — Technical problems experienced with procedures or devices 
Table 9.4: Teacher and student listing forms and teacher tracking forms used for the 
assessment 
Table 9.5: School coordinator information 


n by items derived from tasks 


e ICILS target population 
he ICILS target population 


oO Oo MO AO 


oo 0 0 0 DO O 


oN 


: Final parameter esti 


: Re 


: Re 


International CIL item-rest correlations and weighted item fit 
International CT item-rest correlations and weighted item fit 


CIL item pairs showing local dependence 


ICILS 2018 TECHNICAL REPORT 


Percentages of scorer agreement for constructed-response CIL and 


CT test items 
Gender DIF estima for ClL test | 


for CT 


tes 
Gender DIF estimates 
Percentages of omitted 
time for CIL items 
Percentages of omi 


time for CT items 


tteC 


Percentages of omitted 
time overall, by module 
imates of the Cl 


Final parameter est 


soci 
Reli 
activiti 


oeconomic background 


es 
Item parameters for scales measuri 
Re 
activiti 


es 
Item parameters for scales measuri 
communication activities 


Reliabilities for scale measuring stu 


Item parameters for scale measuring students’ use of 


activities 
Reliabili 


ties for scale measuring stu 


Item parameters for scale measuring students’ use of 


purposes 
class activities 
for class activities 


iabiliti 
of ICT and coding tasks 


test items 


responses and 


responses and 


abilities for scales measuring studen 


iabilities for scales measuring studen 


iabilities for scales measuring students’ reports o 


tems 


Litems 


mates of the CT items 


ts’ participati 


responses and items not reached due to lack of 


items not reached due to lack of 


items not reached due to lack of 


Transformation parameters for new ICILS 2018 questionnaire scales 
means and standard deviations of original IRT logit scores) 
Factor loadings and reliabilities for the national index of students’ 


on in out-of-school 


ng students’ use of ICT for activities 


ng st 


ts' use of ICT for communication 


udents’ use of ICT for 


dents’ use of ICT for leisure activities 


CT for leisure 


dents’ use of ICT for study purposes 


Item parameters for scales reports on the use of ICT 


Reliabiliti 
Item parameters for scales measuri 
Reliabiliti 
Item parameters for scales measur 
Reliabiliti 


es for scales measuring st 


es for scale measuring tea 


: Item parameters for scale measurin 


udents’ ICT self-ef 


ing students’ perc 


nthe 


CT for study 


use of ICT for 


: Item parameters for scales measuring students’ reports on the use of ICT 


es for scales measuring students’ perceptions of school learning 


for class activities 
ficacy 


ng students’ ICT self-efficacy 
es for scales measuring students’ perceptions of ICT 


eptions of ICT 


chers’ ICT self-efficacy 
g teachers’ ICT self-efficacy 


S 
i 
NO RPO 


LIST OF TABLES AND FIGURES 


oO Oo oO O 


oO Oo Oo MO O 


12.21: 


12.22: 


12233 


12.24: 


12.25: 
12.26: 


TA27: 


12.28: 


12.29: 


12.30: 


12.27: 


12.28: 


12333 


te 


Re 
te 
Re 


and teacher collaboration at 


Ite 


and coding tasks 


iabilities for scales meas 


m parameters for scales 
nd coding tasks 
iabilities for scale measu 


U 


r 


iabilities for scale measu 


iabilities for scales meas 


iabilities for scales meas 


resources and teacher colla 


Re 


iabilities for scales meas 


professional learning 


Ite 


ICT-r 


Re 


te 


te 


te 


pri 


: Re 
availability of digital resources at school 


iabilities for scales meas 


negative outcomes of using 


bilities for scales meas 


bilities for scale measu 


abilities for scales meas 


iabilities for scales meas 


CT use at schools 


r 


U 


U 


U 


U 


U 


r 


U 


U 


iabilities for scale measu 


ai 


teachers 


i 


ring teachers’ emphasis on learning of ICT 


measuring teachers’ emphasis on learning of 


ing teachers’ use of ICT for class activities 


a 
a 

m parameters for scale measuring teachers’ use of ICT for class activities 
a 


ing teachers’ use of ICT for teaching practices 


m parameters for scale measuring teachers’ use of ICT for teaching 
practices 


ring teachers’ use of ICT tools in class 


m parameters for scales measuring teachers’ use of ICT tools in class 


ring teachers’ perceptions of ICT resources 
school 


m parameters for scales measuring teachers’ perceptions of ICT 
boration at school 


ring teachers’ participation in |CT-related 


m parameters for scales measuring teachers’ participation in 
elated professional learning 


ring teachers’ perceptions of positive and 


ICT for teaching and learning 
m parameters for scales measuring teachers’ perceptions of positive 
and negative outcomes of u 
: Relia 

m parameters for scales measuring principals’ use of ICT 
: Relia 


ng ICT for teaching and learning 


ring principals’ use of ICT 


ing principals’ views on using ICT 


m parameters for scale measuring principals’ views of using ICT 
> Rell 
nowledge and skills of teachers 


ring principals’ reports on expected ICT 


m parameters for scales measuring principals’ reports on expected 
CT knowledge and skills of 


ring school principals’ reports on priorities for 


m parameters for scales measuring school principals’ reports on 
orities for ICT use at schools 


ng school ICT coordinators’ reports on the 


m parameters for scale measuring ICT coordinators’ reports on the 
availability of digital resources at school 

iabilities for scales measuring ICT coordinators’ reports on hindrances 
the use of ICT for teaching and learning at school 

m parameters for scales measuring ICT coordinators’ reports on 
drances to the use of ICT for teaching and learning at school 


192 


193 


195 


195 


197 
197 


199 


199 


201 


201 


203 


203 


205 


215 


215 


217 


217 


xi 


xii 


ooo wo wo wo wo wo wo ow wo ow wo! wo» wo wo oD ow ow ow wD ow ow ow 


o! 


o! 


oo oododdaodobeoaedbeodbddadaoeodrdaoaeoveddaoododnod 0 Od 


Se Od 


13) 
13:2: 


13. 


13. 


13. 
13. 


B.1 
B.1 
B.2 


ale 


he 


13.4: 


2: 


6: 
Vis 


ries 
2 
Qi 


>PDEEDEEDEPPLETEPPLPTPEPPLPTPEPPrPrPySPprPySEyPps 


t 
Nob 


Sc 


Na 
overall errors 
Na 
errors 
Cl 


ICILS 2018 TECHNICAL REPORT 


mber of jackknife zones in national samples 
Example for computation of replicate weights 


tional averages for CIL with standard deviations, sampling, and 


tional averages for CT with standard deviations, sampling, and overall 


LS 2018 teachers included in multiple regression analyses of teachers’ 


emphasis on teaching ClL and CT-related skills 


Coefficients of missing indicators in multilevel analysis of CIL and CT data 


ICILS 2018 students included in multilevel analyses of variation in CIL 
and CT 

ocation of student sample in Chile 

ocation of teacher sample in Chile 

ocation of student sample in Denmark 

ocation of teacher sample in Denmark 

ocation of student sample in Finland 

ocation of teacher sample in Finland 

ocation of student sample in France 

ocation of teacher sample in France 

ocation of student sample in Germany 

ocation of teacher sample in Germany 

ocation of student sample in Italy 

ocation of teacher sample in Italy 

ocation of student sample in Kazakhstan 

ocation of teacher sample in Kazakhstan 

ocation of student sample in Korea 

ocation of teacher sample in Korea 

ocation of student sample in Luxembourg 

ocation of teacher sample in Luxembourg 

ocation of student sample in Portugal 

ocation of teacher sample in Portugal 

ocation of student sample in the United States 

ocation of teacher sample in the United States 

ocation of student sample in Uruguay 

ocation of teacher sample in Uruguay 

ocation of student and teacher sample in Moscow, Russian Federation 

ocation of student sample in North Rhine-Westphalia, Germany 

ocation of teacher sample in North Rhine-Westphalia, Germany 


LIST OF TABLES AND FIGURES 


Figures 
Figure 4.1: 


Figure 5.1: 


Figure 6.1: 
Figure 6.2: 
Figure 6.3: 


Figure 7.1: 
Figure 8.1: 
Figure 10.1: 


igure 11.1: 
Figure 11.2: 
Figure 11.3: 

igure 11.4: 
Figure 11.5: 
igure 11.6: 
igure 11.7: 

gure 11.8: 
gure 11.9: 


igure 11.8: 
gure 11.9: 


Figure 12.3: 


Figure 12.4: 


Figure 12.5: 


Figure 12.6: 


Figure 12.7: 


Figure 12.8: 


Figure 12.9: 
Figure 12.10: 


Figure 12.11: 
Figure 12.12: 


Figure 12.13: 


Figure 12.14: 


Contexts for ICILS 2018 CIL/CT learning outcomes 


ICILS 2018 instrument preparation workflow 


Visuali 


zation of PPS systematic sampling 


Illustration of sampling precision—simple random sampling 


Sampling precision with equal sample sizes—simple random sampling 


versus 


Partici 


cluster sampling 


pation categories in ICILS 


Activit 


ies with schools 


Overview of data processing at IEA 


Mappi 
Mappi 


ng of ClL student abilities and item difficulties 


ng of CT student abilities and item difficulties 


Item characteristic curve by score for dichotomous item RO2Z 


Item characteristic curve by category for item BO8Z 


Local dependence within CIL modules 


Example of item statistics provided to national centres 


Example of item-by-country interaction graph for item HO7E 


Relative item difficulties for CIL common items in 2013 and 2018 


CIL proficiency level cut-points and percentage of students at each leve 
Relative item difficulties for CIL common items in 2013 and 2018 
CIL proficiency level cut-points and percentage of students at each leve 


perceptions of 


CT self-efficacy 
perceptions of 


CT self-efficacy 


Confirmatory factor analysis of iterms measuring students’ use of ICT for 
activities 

Confirmatory factor analysis of iterns measuring students' use of ICT for 
communication activities 

Confirmatory factor analysis of iterns measuring students’ use of ICT for 
leisure activities 

Confirmatory factor analysis of iterns measuring students’ use of ICT for 
study purposes 

Confirmatory factor analysis of iterns measuring students’ report on the 
use of ICT for class activities 

Confirmatory factor analysis of iterns measuring students’ 

school learning of ICT and coding tasks 

Confirmatory factor analysis of iterns measuring students’ 
Confirmatory factor analysis of iterns measuring students’ 

ICT 

Confirmatory factor analysis of iterms measuring teachers’ 

Confirmatory factor analysis of itermns measuring teachers’ emphasis on 
learning of ICT and coding tasks 

Confirmatory factor analysis of iterns measuring teachers’ use of ICT for 
class activities 

Confirmatory factor analysis of iterms measuring teachers’ use of ICT for 


teachi 


ng practices 


194 


196 


xiii 


xiV ICILS 2018 TECHNICAL REPORT 


Figure 12.15: Confirmatory factor analysis of items measuring teachers’ use of ICT 198 
tools in class 

Figure 12.16: Confirmatory factor analysis of items measuring teachers’ perceptions 200 
of ICT resources and teacher collaboration at school 

Figure 12.17: Confirmatory factor analysis of items measuring teachers’ participation in 202 
|CT-related professional learning 

Figure 12.18: Confirmatory factor analysis of items measuring teachers’ perceptions of | 204 
positive and negative outcomes of using ICT for teaching and learning 

Figure 12.19: Confirmatory factor analysis of items measuring school principals’ use 206 
of ICT 

Figure 12.20: Confirmatory factor analysis of items measuring school principals’ views 208 
on using ICT 

Figure 12.21: Confirmatory factor analysis of items measuring principals’ reports on 210 
expected ICT knowledge and skills of teachers 

Figure 12.22: Confirmatory factor analysis of items measuring school principals’ reports 212 
on priorities for ICT use at schools 

Figure 12.23: Confirmatory factor analysis of items measuring school ICT coordinators’ 214 


reports on the availability of digital resources at school 


Figure 12.24: Confirmatory factor analysis of items measuring ICT coordinators’ reports 216 
on hindrances to the use of ICT for teaching and learning at school 


CHAPTER 1: 


Overview of the IEA International 
Computer and Information Literacy Study 


2018 


Julian Fraillon, Sebastian Meyer, and John Ainley 


Introduction 


The International Association for the 


Evaluation of Educational Achievement (IEA) International 


Computer and Information Literacy Study (ICILS) 2018 investigated how well students are 
prepared for study, work, and life in a digital world. There is increasing acknowledgment across 
countries that with rapid advancement of new technologies it is important to develop the capacities 


of people to use information and communication technologi 
Commission 2018). ICILS 2018 focused on “the capacities 
for arange of purposes, in ways that go beyond a basic use o 


es (ICT) (see, for example, European 
of students to use ICT productively 
f ICT” (Fraillon et al. 2019, p. 1). ICILS 


2018 was based on and expanded the work of ICILS 2013 (Fraillon et al. 2014). 


ICILS 2018 included three main foc 
literacy (CIL) which was det 
communicate in order 


and 


society” (Fraillo 


Secondly, as an optional componen 


thinking (CT) w 
are appropriate 
to those prob 
201 
engaging challe 


ems so that 


9, p. 27). CT was not assessed 


to par 
net al. 2019, p. 1 


hich is defined as 
for computationa 
the solu 


nge for the studen 


fined as 
ticipate effectively at hom 
6). CIL was also assessed 
t 
the “ability to recognize as 


a problem into 


used to solve a problem. Thirdly, ICILS 2018 investigated the con 
CT are developed by collecting and ana 
ital devices by students and 
learning of ClL and CT insc 


twee 


th, an 
(4 


oS 


Th 


and out 


ICILS systematically reviewed differences among participating cou 
with regard to students’ CIL and CT, and how participating countri 
supported ICT-related education. It explored differences within an 
to the relationship between ICT-related learning outcomes, stud 
contexts. The outcomes of these reviews and analyses are reported i 


report (Fraillon 


LS was based around fo 
n and within countr 
ated to student achievement i 
d self-reported prof 
the aspects of stude 


ogical steps but a 


NOOIS. 


ies; 


nts’ 


e ICILS 2018 assessment framework ( 
research questions. The fran 


et al. 2020). 


teachers, and the resources available to support the teaching and 


ur research questions concerned wit 
2) aspec 
nClLand 
ciency in usin 
personal 


nework also provides greater detail re 
ines the variables necessary for analyses associated with 


us areas. Firstly, ICILS 2018 assessed computer and information 
“an individual’s ability to use computers to investigate, create, 
e, at school, in the workplace, and in 
in the first study cycle, ICILS 2013. 
for participating countries, ICILS 2018 assessed computationa 
pects of real-world problems whic 
te and develop algorithmic solutions 
ized with a computer” (Fraillon et al. 

in ICILS 2013. The assessment of CT was an innovative and 
ts, evaluating not only their ability to analyze and break down 
so assessing their understanding of how computers might be 
texts in which students’ CIL and 
he use of computers and other 


formulation and to eva 
tions could be operatio 


yzing data relating to t 


h: (1) variations in CIL and CT 
ts of schools, education systems, and teaching that are 
CT; (3) the extent to which students’ access to, familiarity 
g ICT are related to student achievement in CIL and CT; 
and social backgrounds that are related to CIL and CT. 


Fraillon et al. 2019) describes the development of these 
ating to the measured domains 
the research questions. 


ntries and education systems? 
es and systems provided and 
across countries with respect 
nt characteristics, and school 
nthe ICILS 2018 international 


qd 
e 


1 Education systems are units within countries with a degree of educational autonomy that have participated following 
the same standards for sampling and testing as countries. In this report, education systems are often referred to as 
countries for ease of reading. 


Instrument 
CIL test 


Students comp 


ICILS 2018 TECHNICAL REPORT 


s 


eted acomputer-based test of CIL that consisted of questions and tasks that were 


administered as five different 30-minute modules. Three of the CIL modules had been developed 


and used in IC 
collected in IC 


ILS 2013 and kept secure as trend modules. These were included to allow data 
LS 2018 to be equated with data from the previous cycle and reported on the 


CIL proficiency scale established for ICILS 2013. Consequently, it was possible to compare CIL 
achievement over time in those countries that participated in both cycles. Two new modules were 


developed for t 


software environments. Data collected froma 


he ICILS 2018 ClL test instrument to address contemporary thematic content and 
five CIL modules in ICILS 2018 were used as the 


Yn 


basis for reporting ICILS 2018 ClLresults onthe ICILS ClL achievement scale established in 2013. 


Each student completed two modules randomly allocated from the set of five available modules 


so that the tota 


assessment time for each student was one hour. Each of the assessment modules 


consisted of aset of questions and tasks based on a realistic theme and following a linear narrative 


structure. These modules consisted of a series of small discrete ta 


Wn 


s (typically taking less than a 


minute to complete) followed by a large task that typically took 15 to 20 minutes to complete. In 


total, the modu 


les comprised 81 discrete questions that generated 102 score points. 


When students began each module they were first presented with an overview of the theme 


nd purpose of 


tudents were 


e Bandcompetition (2013 & 2018): Students planned a website, edited an image, and used asimp 
website builder to create a webpage with information about a school band competition. 

e Breathing (2013 & 2018): Students managed files, and evaluated and collected information i 
order to create a presentation explaining the process of breathing to eight- or nine-year-old 


students. 


e School trip (2013 & 2018): Using online database tools, students helped plan a school trip and 


selected and 


the tasks in the module including a basic description of what the large task would 


a 

comprise. In the narrative of each module the smaller discrete tasks typically comprised a mix of 
skill execution and information management tasks that built towards completion of the large task. 
S 


required to complete the tasks in the allocated sequence and could not return to 


review completed tasks. 


The five modules measuring students’ CIL were: 


oo) 


DB 


adapted information to produce an information sheet about the trip for their peers. 


The information sheet included a map created using an online mapping tool. 


e Boardgames 


(2018): Students use a school-based social network for direct messaging and group 


posting to encourage peers to join a board games interest group. 


e Recycling (2018): Students access and evaluate information from a video sharing website to 


take researc 


identify a suitable information source relating to waste reduction, reuse, and recycling. Students 


to raise awareness about waste reduct 


hnotes from the video and use their notes as the basis for designing an infographic 
on, reuse, and recycling. 


In total, there were 20 different possible combinations of module pairs. Each module appeared in 
eight of the combinations—four times as the first and four times as the second module when paired 
with each of the other four. The module combinations were randomly allocated to students. This 
test design made it possible to assess alarger amount of content than could be completed by any 
individual student and thus ensured broad coverage of the content of the ICILS 2018 assessment 
framework. The design also controlled for item position effects on task difficulty across the sampled 


students and provided a variety of contexts for the assessment of CIL. 


OVERVIEW OF ICILS 2018 


CT test 


Studen 
based t 
ICILS 2018 and corresponded t 
conceptualizing problems and t 
theme and asequence of related tas 
inalarge task. Students completed t 
completed both the CIL test and the student 
score points derived from 18 discrete tasks an 
automatically scored. The exceptions were so 
trained scorers at each national center. 


ts inthose countries par 


e tasks in the CT module pri 
iverless bus. The set of tasks 
ormation and p 
lect data and dr 


the CT module focusing on opera 
coding environment compri 
functions and could be assembled in 
in farming. Students were required 
were presented with a work space, 
drone completing the command 
and actions required to solve the pro 
advanced through the modu 
that were available and the sequence of action 


Questionnaires 


The completion of the CIL assessment was fo 


characteristics, their experience a 


ticipating in the CT assessment comple 
est of CT that consisted of two 25-minute 
othe two strand 
he other to operatio 
s. Unlike th 
he two CT 


ec 
test 


q 
d 


m 


marily focused on planning digital solu 
included manipulating and interpreting visual representations of 
rocesses associated with behavior of the bus, and conf 
aw conclusions about the behavior of the bus under specified conditions. 


tionalizing solutions, students worked within a simp 
sing blocks of code that have some specified and some confi 


blem instance. The tasks were more complex as the s 
le. The complexity of the tasks related to the variety of code fu 
s required by the drone 


and out of school, and their attitudes towards 


Three further instruments were d 


schools: 
e A30-m estionnaire 
relating to 
in teaching f 
ticipation in 


nute teacher q 
followed by questions 
educational activities 
schools, and their par 


when teaching. 


in their sc 
gatheredi 


igned to gather information from and abou 


included some questions relating to teache 
teachers’ reported familiarity with ICT, thei 
ocused on a “reference class,”? their perceptions of ICT 
professional learning activities relating to t 


s of 


ues 
questions. Student responses to most tasks were 
eo 


lowed by a 30-minute s 
administered on computer. The questionnaire included questions relating to students’ background 
nd use of ICT to complete a range of different tasks in schoo 
the use of ICT. 


ted an additional computer- 
modules. These modules were developed for 
the CT construct, one of which was related to 
nalizing solutions. Each module had a unifying 
L modules, each CT module did not culminate 
modules in a randomized order after they had 
tionnaire. In total, the CT data comprised 39 


pen-response quest 


ons that were scored by 


tions needed to control a 


figuring simulations to 


e visual 
gurable 


sequence to control the actions of a simulated drone used 
to create, test, and debug code-based algorithms. S 
“draggable” commands, and a visual output that showed the 
s. The complexity of each task related 


tudents 
to the number of targets 
tudents 
nections 
for completion of the task. 


tudent questionnaire also 


t teachers and 


rs’ background 
r use of ICT in 
in 
heir use of ICT 


A 15-minute ICT coordinator questionnaire asked ICT coordinators about the resources available 
hool to support the use of ICT in teaching and learning. In addition, the questionnaire 
nformation about schools’ technological (e.g., infrastructure, hardware, and software) 


and pedagogical support as well as professional learning and hindrances to the use of ICT in 


school education. 


e A 15-minute principal questionnaire asked principals to provide information about school 


character 
ICT into teaching and learning. 


stics and then about school approaches to ICT-related teaching and incorporating 


The “reference class” was defined as the first target grade class taught by the respondent for a regular subject (i.e., 


other than home room, assembly, etc.) on or after the Tuesday following the last weekend before the respondent first 
accessed the questionnaire. See page 7 for a complete definition of the reference class. 


ICILS 2018 TECHNICAL REPORT 


An additional national context questionnaire was used to gather information from ICILS 2018 national 
centers about national contexts for and approaches to the development of students’ CIL and CT. 
This included information on policies and practices as well as on expectations and requirements 
for the pedagogical use of ICT. When answering this questionnaire, which was administered online, 
national centers were requested to draw on all available national expertise to provide the required 
information as well as to provide reference documents where appropriate. 


Computer-based test delivery 


ICILS 2018 used purpose-designed software for the computer-based student assessment and 
questionnaire. These were administered primarily using USB drives connected to school computers. 
After administration of the student instruments, the ICILS research team either directly uploaded 
data to aserver or submitted this information to national research centers for subsequent upload 
by national center staff. 


The teacher and school questionnaires were usually completed online (directly accessing aserver 
at IEA over the internet). However, respondents were also offered the option of completing the 
questionnaires on paper. 


Measures 
The CIL scale 


ClLwas defined as an “individual's ability to use computers to investigate, create, and communicate 
in order to participate effectively at home, at school, inthe workplace, and in society” (Fraillon et al. 
2019, p. 16). The ICILS 2018 CIL construct was conceptualized around four strands that framed 
skills and knowledge addressed by the ClLinstruments.° We used the Rasch item response theory 
model (Rasch 1960) to derive the CIL scale from the data collected from student responses to 81 
test questions and large tasks that generated 102 score points. Most questions and tasks each 
corresponded to one item. However, raters scored each ICILS large task against a set of criteria 
(each criterion with its own unique set of scores) relating to the specific properties of the task. 
Each large task assessment criterion can therefore be regarded as an item in ICILS. 


The ICILS ClL reporting scale was established in ICILS 2013, with a mean of 500 (the average CIL 
scale score across countries in 2013) and a standard deviation of 100 for the equally weighted 
national samples that met IEA sample participation standards in the first cycle. Plausible values 
were generated with full conditioning to derive summary student achievement statistics. 


The described scale of CIL achievement in ICILS is based on the content and scaled difficulties of 
the assessment items. The ICILS research team wrote descriptors for the expected CIL knowledge, 
skills, and understandings demonstrated by students who correctly responded to these items. 
Ordering the item descriptors according to their scaled difficulty (from least to most difficult) was 
used to develop an item map. The content of the items was used to inform judgements about the 
skills represented by groups of items on the scale ordered by difficulties. 


Analysis of this item map and the student achievement data were then used to establish proficiency 
levels that each had a width of 85 scale points with level boundaries set at 407,492,576, and 661 
scale points (rounded to the nearest whole number). Student scores below 407 scale points indicate 
CIL proficiency below the lowest level targeted by the assessment instrument. 


3 ClLwas described as comprising only two strands in ICILS 2013. Following an extensive evaluation of the ICILS 2013 
ClL construct and in consultation with ICILS national researchers the ICILS project team established a revised structure 
for the CIL construct for ICILS 2018. The restructuring of the CIL construct was undertaken to better communicate 
the contents and emphases of the construct and to minimize overlap across the aspects of the construct (Fraillon et al. 
2019, p. 17). 


OVERVIEW OF ICILS 2018 


The four described levels of the C 


Lsca 


e are summarized as follows: 


e Level4 (above 661 scale points): Students working at Level 4 select the most relevant information 


to use for communicative purposes. 


associated with need and evalua 


tethe 


They evaluate usefulness of infor 


reliability of information based on 


mation based on criteria 
its content and probable 


origin. These students create information products that demonstrate a consideration 
and communicative purpose. They a 
present information in a manner that is consistent with presentation 
at information to suit the needs of an audience. Students working at Level 4d 
awareness of problems that can arise regarding the use of proprie 


adaptth 


internet. 


Level 3 (577 to 661 scale p 
work independently when u 
These s 
retrieve 
instructions to use conventi 
reformat information produ 
can be influenced by the ide 


Level 2 
and explicit in 
within gi 
information products in re 
products that show consis 
Ww 
a 


nd some con 
Level 1 (407 to 491 score poi 


knowledge of computers as 
being accessed by multiple 


tudents select the most appropriate info 


information from given electronic sources to answer concrete questions, and follow 


492 to 576 score pol 
formation gathering and manage 
ven electronic sources. These studen 


orking at Level 2 demonstra 
sequences of pub 


oints): Students wor 
sing computers as In 


ing at Leve 
rmation sou 
onally recognized sof 


cts. They recognize 
ntity, expertise, and mo 


(ware comm 


nts): Students working at Leve 
ment 
ts make basi 
sponse to specific instructions. 
tency of design and adherence t 
te awareness of mechanisms for 
ic access to pe 


So use appropriate software feat 


that the credi 
tives of the creators of the infor 
2usecom 
tasks. They locate explicit inform 
c edits and add content 
They create si 


rsonal informati 


of audience 
ures to restructure and 
conventions. They then 
emonstrate 
tary information on the 


3 demonstrate the capacity to 


formation gathering and management tools. 


rce to meet a specified purpose, 


ands to edit, add content to, and 
bility of web-based information 
mation. 


plete basic 
ation from 
to existing 
formati 
. Stude 
formati 


puters tocom 


mple in 
o layout conventions 
protecting personal in 
on. 


: Students working at Level 1d 
s and abasic understanding of 


nts 
COO 
U 


basic research and communi 
demonstrate familiarity with 


The CT scale 


cation tasks and add simple conte 


sers. They apply conventional software commands t 


the basic layout conventions of e 


functional worki 
nces of computers 
o perform 
ucts. They 


emonstratea 
the conseque 


nt to information prod 


ectronic documents. 


CT refers to an “individual’s ability to recognize aspects of real-world problems which are 
appropriate for computational formulation and to evaluate and develop algorithmic solutions to 
those problems so that the solutions could be operationalized with acomputer’” (Fraillon et al. 2019, 


p. 27). The CT construct comp 
solutions. 


Weused th 
responses 
ametr 
equally wei 
used p 


The ICILSd 
assessment 


e Rasch item response theory model (Rasch 1960 
to 18 CT tasks that generated 39 score points. The final 
ic that had a mean of 500 (th 
ghted national samples 
ausible values with full condi 


escribed scale of CT ac 


rised two strands: conceptuali 


e ICILS average score) andas 


tioning to derive summary stud 


tode 


in countries who administered t 


hievement is based on the conte 
items. The ICILS research team wrote descriptors for the expected CT knowledge, 


zing problems and operationalizing 


rive the CT scale from student 
reporting CT scale was set to 
ndard deviation of 100 for the 
he CT modules. As for CIL, we 
tachievement statistics for CT. 


ta 


en 


nt and scaled difficulties of the 


skills, and u 
the item de 


nderstandings demons 
scriptors according tot 


item m 


ap, Si 


milar to the item map t 


heir scaled difficulty (from least 
hat was produced for CIL. 


to most difficult) resu 


trated by students correctly responding to each item. Ordering 


tedinan 


ICILS 2018 TECHNICAL REPORT 


Given the limited number of CT tasks and score points, it was not possible to establish proficiency 
levels in the same way as for CIL. However, to provide a broad description of the underlying 


characteristics of achievement across the breadth of the sca 


e we divided the items ordered by 


their difficulty into thirds with equal numbers of items in each third. For ICILS 2018 we refer to 
these as the lower, middle, and upper regions of the CT scale. The descriptions of each region are 


syntheses of the common elements of CT knowledge, skills, a 
items within each region. The regions of the CT scale cannot 
inthe ClL scale, as t 
not comparable. 


The three regions of 


Upper region 


upper region of 
problem-solving 
when using computation to solve real-wor 
the upper region can develop algorithms t 
statements effec 


Middle region 


middle region of the scale demonstrate understanding o 
real-world proble 


they can inte 
repeat state 


Lower region 


ments effectively. 


nd understand 


hey have been developed using a differen 


the CT scale can be described as follows: 


(above 589 scale points): St 
the scale demonstrate an understanding of computatio 
d problems. F 
hat use repeat statements toget 
tively. 


(459 to 589 scale poi 


nts): Students showing achievement c 
f how computat 
te systematic interaction 


esystem. When developi 


ms. They can plan and execu 
rpret the output or behavior of th 


ng 


(below 459 scale points): Students showing achievement 


lower region 


to configure inputs, observe events, and record observations when planning co 


solutions to 


of the scale demonstrate familiarity with the basic conventio 


given problems. When developing problem solutions in the 


correspo 


ing described by the 


be directly compared to the levels 
t process and the scale metrics are 


udents showing achievement corresponding to the 


nas a generalizable 


framework. They can explain how they have executed a systematic approach 
urthermore, students operating within 


her with conditional 


orresponding to the 


ioncan be used to solve 
swith asy 


stem so that 


algorithms, they use 


nding to the 
ns of digital systems 
mputational 
algorithms, 


form of 


they can use a linear (step-by-step) sequence of instructions to meet task objectives. 


Measures based on the student questionnaire 


An 
stu 


In 
of 


S 


e 
S 
a 
p 
p 
S 


dent 
Exper 
Frequ 


Frequ 


Freq 
area 


uency o 
S). 


addi 
interest and 
cales were de 
ngagement. T 


erceptions inc 


tudents learn 


ency of computer use at home, schoo 


tion, students res 


he scales i 
tudent perceptions of IC 
pplications, school purposes, communication a 


responses: 


ency of use of various applications (made up of five categories); and 


f use of computers in differen 


ponded to sets of items that had been designed to 
the basis for generating scales reflecting these und 
to provide measures 
ncluded measures of t 
al 


provided 
veloped 


nd information exchange, and 


luded ICT self-efficacy (in relati 


erceptions of effects of ICT on society. Scales were also generated to reflec 
ed about CIL and CT tasks at school. 


umber of the measures based on the student questionnaire were single-item indices based on 


ience with using computers’ (made up of five categories based on years of using computers); 


, and other places (made up of five categories): 


t subject areas (four categories and eight subject 


measure constructs 
erlying latent traits. 


of a number of dimensions concerned with ICT 
he extent of use of ICT for vari 
easures of the extent of use of ICT included the use of ICT for general 


ous purposes and of 


leisure. Measures of 


onto general and specialist ICT applications) and 


t the extent to which 


OVERVIEW OF ICILS 2018 


Measures based on the teacher questionnaire 


A number of the measures based on the teacher questionnaire were also single-item indices. 
Such measures included experience with using computers for teaching purposes, and frequency 


of computer use in and outside sc 


Sets of items were designed to ge 


e Teachers’ pe 


Teach 


Teachers’ se 


Teachers’ collaborati 


In addition, ino 
in their 
reference class. Teache 


of them was teaching, and to base 


ers’ views of ICT for teac 


f-confidence in usi 


rder to determine m 
teaching, the teacher questi 
d to select the reference class from among the 
r responses regarding their teaching pract 
experiences with that particular class. To ensure that the selection was unbiased, 


hool for ped 


nd 
ng ICT 


hing and lea 


on with other teach 


eas 
on 


ure 
nal 


re 
rs were aske 


thei 


instruction was provided: 


This is the first 
room, assembly 
this questionnai 
you did not teac 


etc.) 


Teachers provided infor 


area they taught to the referen 
ICT tools that they used, the le 
in which they incorporated 


re. You may, of course, 
t grade] class on that Tuesday, please use the [target grade] class that 
fter that Tuesday. 


ha([targe 
you taught on the first day a 


target grade] class that you teach for a regular subject (i.e., other t 


on or after Tuesday foll 


teach 


mation 
ce class, t 
arning ac 


CT, 


neem 


agogical and other purposes. 


nerate scales reflecting underlying constructs such as: 


rceptions of school resources for ICT; 


rning; 


(for general and specialist applications); and 


ers about how ICT is used. 


s of the extent to which the teachers were using ICT 


asked the teachers what they did in a particular 
classes each 
ices on their 


the following 


han home 
owing the last weekend before you first accessed 
the class at other times during the week as well. If 


on how frequently they used ICT in the reference class, the subject 


phasis they placed on developing students’ CIL, the 


tivities in which they used ICT, and the teaching practices 


Measures based on the school questionnaire 


The school questionnaires (for the school principal andthe ICT coordinator) provided m 
policies and practices for using ICT, impedimen 
ICT in teaching and learning, and participation in teacher professional development. | 
tion about school characteristics, school contex 


school access to ICT resources, school 


those questionnaires provided informa 


resources. 


Data from school questi 


(for example, on ratios of school size an 


Data 


Countries 


onnaires provid 


ed information that was used to derive both sim 
d the number of ICT devices) and scales. 


easures of 
ts to using 
n addition, 
ts, and ICT 


ple indices 


The twelve countries that participated in ICILS were: Chile, Kazakhstan, Denmark, the Republic of 
Korea (hereafter referred to as Korea, for ease of reading), Finland, Luxembourg, France, Portugal, 
Germany, the United States, Italy, and Uruguay. Moscow (Russian Federation) and North Rhine- 


Westphalia (Germany) 


ook part as benchmar 


ing participants. 


Denmark, Korea, Finland, Luxembourg, France, Portugal, Germany, the United States, and North 


Rhine-Westphalia (Ger 


Population definitions 


The ICILS student popu 


many) participated int 


lation was defined ass 


he CT international option. 


tudents in grade 8 (typically around 14 years of age 


in most countries), provided that the average age of students in this grade was at least 13.5 at the 


time of the assessment. 


ICILS 2018 TECHNICAL REPORT 


The population for the ICILS teacher survey was defined as all teachers teaching regular school 
subjects to the students in the target grade. It included only those teachers who were teaching the 
target grade during the testing period and who had been employed at school since the beginning 
of the school year. ICILS also administered separate questionnaires to principals and nominated 


ICT coordinators in each school. 


Sample design 
Th 


e samples were designed as two-stage cluster sam 


ples. During the first stage of sampling, PPS 


procedures (probability proportional to size, as measured by the number of students enrolled ina 


school) were u 
ac 
as a guide, each country was instructed 
sizes of schools within countries range 


Sc 


Twenty studen 
sa 


All teachers of the target grade were el 
given that ICT use for teaching and lear 
( 


mpled school. In schools with fewer 


t 
all teachers teaching the target grade. | 


vited to participate. 


1 
in 
selected 
participa 


he sample participation requiremen 


tion rate of 75 percent. 


Achieved samples 


CILS ga 
from 14 countries or education system 
data from more than 26,000 teac 
CT coordina 
both benchmarking participants 
for the teacher survey this was the ca 
participants. Data from countries that 
reported in separate sections of 


1 
b 


he main 
etween 
emisph 


February and December 201 


sed to sample schools wi 
hieve the necessary precision were estimated o 


hools constituted the first stage of samp 


tswere thenrandomly sa 


hose with 21 or more teachers of the 


hers inthose 
tors, school principals, and natio 
met IEA sample participation standards 
seven out of 12 


ICILS survey took place inthe 14 pa 


ere school calendar between 


thin each cou 


to plan for 
d between 
ing bo 


14 


edfroma 
de 


mp 
than 20 stu 


igible to besa 
ning is not res 
target grade 
nschools with 


s within count 


nth 
ami 


th st 


ntry. The numbers required in the sample to 
e basis of national characteristics. However, 
nimum sample size of 150 schools.* Sample 
and 210 across countries. The sampling of 
udents and teachers. 


3 


| students enrolled in the target grade in each 
nts, all students were invited to participate. 


mpled regardless of the subjects they taught 
tricted to particular subjects. In most schools 
, 15 teachers were selected at random from 
20 or fewer such teachers, all teachers were 


ts were applied independently to students and teachers 
schools. The requirement was 85 percent of the selected schools and 85 percent among the 
participants (students or teachers) within the participating schools, or a weighted overall 


thered data from more than 46,000 lower-secondary students in more than 2200 schools 


ries. These student data were augmented by 


schools and by contextual data collected from school 


se for 


did not meet | 


the reporting tables. 


rticipati 
8. The 


February an 


nal research cen 


out of 12 countries and 
for the student survey, 
countries and both benchmarking 
e participation requirements were 


ters. Eleven 


EA samp 


ng countries and benchmarking participants 


survey was carried out in countries with a Northern 


d June 2018 and in those with a Southern 


H 
Hemisph 
countries an extension to their data-col 
a 


t the beginning of the next school year. 


ere school calendar between October and 
ection peri 


December 2018.° Ina fewcases, ICILS granted 
ods or collected data from their target grade 


4 |n Luxembourg the total amount of eligible schools was below 50 which is why all schools were selected (census). 


5 


Italy decided to survey students and their teachers at the beginning of the school year while all other countries 


administered the survey at the end of school year. Their results were annotated accordingly in the international report. 


OVERVIEW OF ICILS 2018 


Outline of the technical report 
This overview of ICILS 2018 is followed by 13 chapters. Chapters 2, 3, and 4, cover the instruments 
that were used in the study. Chapter 2 focuses on the development of the tests while Chapter 3 
provides an account of the computer-based assessment systems. Chapter 4 details the development 


of the questionnaires used in ICILS for gathering data 
school ICT coordinators. The chapter also provides an ou 
contexts survey, completed by th 


from students, teachers, principals, and 


tline of the development of the national 


e national research coordinators. An appreciation of the material 


in these chapters provides an essential foundation for interpreting the results of the study. 


Chapters 5 through 9 focus on the implementation of the survey in 2018. Chapter 5 describes 


the translation procedures and national adaptations used 
design and implementation, while Chapter 7 describes 
and documents the participat 
operations, is closely linked to Chapter 9, which reports on 
from the participating countries during the data collection. 


Chapters 10, 11, and 12 are concerned with datama 
the data-management processes that resulted in th 
details the scaling procedures for the CIL test, i.e., how the responses to tasks and items were used 
to generate the scale scores and proficiency levels. C 
the questionnaire items (student, teacher, and school questionnaires). The final chapter, Chapter 


in ICILS. Chapter 6 details the sampling 


the sampling weights that were applied 
ion rates that were achieved. Chapter 8, which describes the field 


the feedback and observations gathered 


nagement and analysis. Chapter 10 describes 
e creation of the ICILS database. Chapter 11 


hapter 12 describes the scaling procedures for 


13, presents an account of the analyses that underpinned the international report of ICILS 2018 


(Fraillon et al. 2020). 


References 


m=EN 


ielsen & Lydiche. 


Belgium: Author. https://e 


European Council. (2017). 
Author. Retrieved from ht 
Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Gebhardt, E. 
EA International Computer and Information Literacy Study international report. Cham, Switzerland: Springer. 
https://www.springer.co 
Fraillon, J., Ainley, J., Schulz, W., Duckworth, D., & Friedman 
Information Literacy Study 2018 assessment framework. Cham, Switzerland: Springer. https://www.springer. 
com/gp/book/97830301 
Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Duckworth, D. 
d Information Literacy Study 2018 international report. Cham, Switzerland: Springer. 
https://www.springer.com/gp/book/9783030387808 


Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: 


International Computer a 


ps://www.consilium.europa.eu/med 


m/gp/book/9783319142210 


93881 


European Commission. (2018). Communication from the Commission to the European Parliament, the Council, 
the European Economic and Social Committee and the Committee of the Regions (COM(2018) 22 final). Brussels, 
ur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52018DC0022&fro 


European Council Conclusions, 19 October 2017 (EUCO 14/17). Brussels, Belgium: 
ia/21620/19-euco-final-conclusions-en.pdf 


(2014). Preparing for life in a digital age: The 
, T. (2019). IEA International Computer and 


(2020). Preparing for life in a digital world: IEA 


CHAPTER 2: 


ICILS 2018 test development 


Julian Fraillon 


Introduction 


The new content for inclusion in the ICILS 2018 assessment was developed over a 20-month 
period from April 2015 to December 2016. Most of this work was conducted by the international 


study center (ISC) at AC 


research partners. 


ER in collaboration with national research coordinators (NRCs) and other 


The ICILS 2018 assessment included two tests. The test of computer and information literacy 


procedures as well as 


by all students in countrie 
for further details of coun 


This chapter provides a 


The processes are releva 


the test 


Test scope and format 


CIL) was completed by all students and the test of computational thinking (CT) was completed 
s that elected to undertake this additional assessment (see Chapter 41 
try participation in ICILS). 


detailed description of the test development process and review 


design implemented for the ICILS 2018 field trial and main survey. 


nt for the tests of both CIL and CT and are consequently described 
together. Where relevant, details of the content of each test are described separately. 


Table 2.1 provides an overview of the test development processes and timeline. 


ICILS 2018 assessment framework 


The ICILS student tests of ClLa 
framework? (Fraillon et al. 2019). CIL was defined as an “individual’s ability to use computers to 
investigate, create, and communicate in order to participate effectively at home, at school, in 
’ (Fraillon et al. 2019, p. 16) and CT was defined as the “ability to 


the workplace, and in 


and to evaluate and d 
be operationalized wi 


evelop a 
thacom 


Each of ClL and CT a 


However, the described stru 


society’ 
recognize aspects of real-wor 


re descri 
overarching conceptual cate 


nd CT were developed with reference to the ICILS 2018 assessment 


d problems which are appropriate for computational formulation 
gorithmic solutions to those problems so that the solutions could 
puter” (Fraillon et al. 2019, p. 27). 


bed in the ICILS 2018 assessment framework in terms of strands 
gories) and aspects (specific content categories within strands). 


ctures of CIL and CT were not intended to presuppose a sub- 


dimensional analytic structure (see Fraillon et al. 2019 for further details). 


The CIL framework 


The following list sets out the four strands and corresponding aspects of the CIL framework. Full 
details of the CIL construct can be found in the ICILS 2018 assessment framework. 


e Strand 1: Understanding computer use, comprising two aspects: 


— Aspect 1.1: Foundations of computer use 


— Aspect 1.2: Computer use conventions 


e Strand 2: Gathering information, comprising two aspects: 


— Aspect 2.1: Accessing and evaluating information 


— Aspect 2.2: Managing information 


1 The framework can be downloaded from: 
https://www.springer.com/gp/book/978303019388 1 


12 


e Strand 3: Producing information, co 
— Aspect 3.1: Transforming informa 
— Aspect 3.2: Creating information 


e Strand 4: Digital communication, co 


- Aspect 4.1: Sharing information 
- Aspect 4.2: Using information res 
The CT framework 


The following list sets out the two stra 


e Strand 1: Conceptua 
— Aspect 1.1: Knowi 


mprising two aspects: 
tion 


mprising two aspects: 


ponsibly and safely 


ICILS 2018 TECHNICAL REPORT 


nds and corresponding aspects of the CT framework. Full 
details of the CT construct can be found in the ICILS 2018 assessm 


izing problems, comprising three aspects: 
ng about and understanding digital systems 


— Aspect 1.2: Formulating and analyzing problems 


- Aspect 1.3: Collecting and repres 


enting relevant data 


e Strand 2: Operationalizing solutions, comprising two aspects: 


— Aspect 2.1: Planning and evaluating solutions 
— Aspect 2.2: Developing algorithms, programs, and interfaces 


The ICILS test instruments 


The CIL test instrument 
The questions and tasks making up the 
each of which took 30 minutes to co 


allocated from the set of five. Three of the modules were secure tre 
2013 that provided a basis for reporting all CIL data collected in 
achievement scale that was established in 2013. Two of the modu 


inclusion in ICILS 2018. 


ent framework. 


ICILS ClIL test instrument were presented in five modules, 


mplete. Each student comp 


eted two modules randomly 
nd modules first used in ICILS 
ICILS 2018 on the ICILS CIL 
es were newly developed for 


A module is aset of tasks based on an authentic theme and following a linear narrative structure. 


Each module has a series of discrete 
complete, followed by alarge task that 


tasks, each of which typicall 
typically takes 15 to 20 minu 


of each module positions the discrete tasks as a mix of skill-execution 
tasks that students need to do in preparation for completing the large task. 


y takes less than a minute to 
tes to complete. The narrative 
and information-management 


When beginning each module, students were presented with an overview of the theme and purpose 
of the tasks in the module as well as a basic description of what the large task would comprise. 
Students were required to complete the tasks in the allocated sequence and could not return to 


review completed tasks. Table 2.2 incl 
including the large tasks. 


The ICILS CIL test modules included t 


e |Information-based response tasks: These tasks make use of the “di 


and-paper style questions in a slig 


short constructed response, drag an 


nowledge and understanding of Cl 
basic skills required to record ares 


or with minimal interactivity and the purpose of these tasks is to 


hree broad categories of tas 


htly richer format than trad 


ddrop), the stimulus materia 


‘ 


Lindependently of students u 


udes asummary of the five ICILS CIL assessment modules 


described below: 


gital interface to deliver pencil- 
tional paper-based methods” 


Fraillon et al. 2019, p. 46). The response formats for these tasks can vary (e.g., multiple choice, 


for these tasks is usually static 
‘capture evidence of students’ 
sing anything beyond the most 


ponse’” (Fraillon et al. 2019, p. 46). 


ICILS 2018 TEST DEVELOPMENT 


Table 2.1: Test development processes and timeline 


13 


Year Month Group Activity 

2015 February ICILS international study center Establishment of CIL test specifications and preliminary CT 
test specifications 

2015 March First meeting of national Reflections on ICILS 2013 test of CIL 

research coordinators Review of proposed test development process and test 
(Krakow) specifications 
Test development workshop 

2015 arch ICCS international study center Drafting, review, and refinement of test modules 

2015 December National research coordinators Web-based review of test module storyboards (1) 

2016 January ICILS international study center Revision of test module storyboards 

2016 February Second meeting of national Review of draft test modules 

research coordinators 
(Amsterdam) 

2016 arch ICILS international study center Revision of draft test modules following second meeting of 
national research coordinators and pilot-testing of selected 
module content 

2016 April National research coordinators Web-based review of test module storyboards (2) 

2016 May ICILS international study center Revision of test modules and development of content in online 

IEA test delivery system 

2016 September | Third meeting of national Review of modules proposed for inclusion in field trial test and 

research coordinators (Porto) confirmation of test design 

2016 October CILS international study center Finalization of field trial test modules 

2016 November CILS field trial scoring trainers Review of field trial scoring guides for constructed-response 
items and large tasks (as part of scoring training) 

2016 December CILS international study center Revision of field trial scoring guides for constructed-response 
items and large tasks 

2017 July CILS international study center Migration of test content to online delivery system 

RM Results 

2017 August CILS international study center Analysis of field trial item data and recommendations for 
modules/items to be included in main survey test (field trial 
analysis report) 

2017 September Fourth meeting of national) Review of field trial analysis report and recommendations for 


research coordinators (Berlin) 


test design and modules/items proposed for inclusion in main 
survey 


2017 December ICILS main survey scoring 
trainers (Hamburg) 


Review of main survey scoring guides for constructed- 
response items and large tasks (as part of scoring training) 


2018 January ICILS international study center 


Finalization of main survey scoring guides for constructed- 
response items and large tasks 


14 


© Ski 


ICILS 2018 TECHNICAL REPORT 


lls tasks: Inthese tasks students “use interactive simulations of generic software or universal 


applications to complete an action’ (Fraillon et al. 2019, p.47). The number of steps required to 
complete a skills task and the number of different correct methods for executing a task varies 
across skills tasks. Linear skills tasks require students to execute one or more commands in 


ag 


iven sequence (such as copy and paste) whereas nonlinear skills tasks require students to 


execute a function involving more than one sub-command without a single given sequence (such 
as using the filter functions in an online database to locate information). 


e Authoring tasks: These tasks “require students to modify and create information products 


USI 


ng authentic computer software applications” (Fraillon et al. 2019, p. 49). The complexity of 


authoring tasks vary according to the number of different applications students were required 

to use, the range of viable solutions to the task, and the amount of information students were 

required to evaluate and make use of when completing the task. The authoring tasks were most 
m 


commonly the large task within each 
criteria applied by human scorers. 


odule and typically were scored using multiple analytic 


Further details of the ICILS CIL test modules, including example tasks, are presented in the ICILS 
2018 assessment framework (Fraillon et al. 2019) and the ICILS 2018 international report (Fraillon 
et al. 2020). 


Table 2.2: Summary of ICILS 2018 CIL test modules and large taskst 


Module 


Description and large task 


Band competition 
(also used in ICILS 2013) 


Students plan a website, edit an image, and use a simple website builder to create a 
webpage with information about a school band competition. 


Breathing 
(also used in ICILS 2013) 


Students manage files and evaluate and collect information to create a presentation to 
explain the process of breathing to eight- or nine-year-old students. 


School trip 
(also used in ICILS 2013) 


Students help plan a school trip using online database tools and select and adapt 
information to produce an information sheet about the trip for their peers. 
The information sheet includes a map created using an online mapping tool. 


Board games 
(new for ICILS 2018) 


Students use a school-based social network for direct messaging and group posting to 
encourage peers to join a board games interest group. 


Recycling 
(new for ICILS 2018) 


Students access and evaluate information from a video sharing website to identify a 
suitable information source relating to waste reduction, reuse, and recycling. Students 
take research notes from the video and use their notes as the basis for designing an 
infographic to raise awareness about waste reduction, reuse, and recycling. 


The CT test instrument 


The tasks making up the ICILS CT test instrument were presented in two modules, each of which 
took 25 minutes to complete. Each student completed both modules (in randomized order across 


modul 


imila 


trand 


tudents) after they had completed each of the CIL test and the student questionnaire. Both 


es were newly developed for inclusion in ICILS 2018. 


r to the CIL assessment, each CT test module contained a set of tasks linked by acommon 


arge task, in contrast, each module comprised a set of items associated with the processes of 


S 
S 
narrative theme. Unlike the CIL modules, the CT modules did not culminate in students completing 
a 
p ing and execution of computer-based solutions to real-world problems. 


f the modules, automated bus, focused on content associated with the conceptualizing problems 
of the CT framework. The narrative theme of the module related to planning aspects of 


rogrammable decision-making to be implemented in a driverless bus, such as route planning and 
ing at safe distances to avoid collisions. The tasks in this module included visual representation 


frea 
config 


-world scenarios (through, for example, path diagrams, flow charts, and decision trees) and 
uring, running, and interpreting results of a simulation tool. 


ICILS 2018 TEST DEVELOPMENT 15 


The second module, farm drone, focused on content associated with the operationalizing solutions 
strand of the CT framework. The narrative theme of this module related to the use of visual code 
to control the actions of a drone used in farming. In the farm drone module students were able to 
return to earlier tasks to check and revise their responses. 


Like the CIL test modules, the CT test modules contain information-based response tasks and 
skills tasks. However, in addition to these, the CT test modules include task types that are unique 
to the CT assessment. The three CT-specific task types are described below. 


e Nonlinear systems transfer tasks: These tasks require students to “interpret, transfer and adapt 
algorithmic information so that the outcomes of the application of algorithmic instructions can 
be displayed visually” (Fraillon et al. 2019, p. 51). The response formats for these tasks can vary 
but what they have in common is that students are required to make connections between a 
visual representation of an algorithmic sequence and the steps of the sequence described as 
text. 


e Simulation tasks: These tasks require students to work with some form of simulation tool, 
typically as part of developing an understanding of real-world problems or to evaluate solutions 
to problems. The tasks can have students set parameters on the tool, run a simulation, collect 
data, and interpret the results. 


e Visual coding tasks: These tasks require students to manipulate visual code blocks that can 
be used to execute a range of actions. In ICILS 2018, these tasks focused on managing code 
blocks that could control the movement and some basic actions of a virtual drone. Students 
could assemble the code blocks in a work space, rearrange the code blocks, and configure some 
aspects of the code blocks (such as the number of repeats in a loop or the material dropped 
by drone). Students could also run the code at any time and view the activation of code blocks 
and the corresponding behavior of the drone. They could also separately reset both the drone 
and the code blocks/work space to their original states. There were two main forms of visual 
coding tasks: 


i. Algorithm construction tasks require students to assemble sequences of code blocks in order 
for the drone to execute a prescribed set of actions. 


ii. Algorithm debugging tasks require students to correct an existing flawed algorithm (provided 
to students as an editable configuration of code blocks in the work space) so that the drone 
could execute a prescribed set of actions. 


Further details of the ICILS CT test modules, including example tasks, are presented in the ICILS 
2018 assessment framework (Fraillon et al. 2019) and the ICILS 2018 international report (Fraillon 
et al. 2020). 


Test-development process 


The test-development process consisted of a series of stages applicable to the development of 
both the CIL and CT test instruments. Although these stages followed each other sequentially, the 
iterative and collaborative nature of the overall process meant that some materials were reviewed 
and revised within particular stages more than once. Insummary, the ISC developed item materials 
(sometimes based on suggestions from NRCs), which were reviewed by NRCs and then revised 
by the ISC. Sometimes this process was repeated. 


nICILS, each task or item comprises the stimulus materials available to students, the question or 
instructions given to students, the specified behavior of the computer delivery system in response 
to students’ actions, and the scoring logic (as specified in the scoring guides for human scoring) for 
each item or task. The test development and review process encompassed all of these constituent 
parts of the ICILS modules. 


16 


ICILS 2018 TECHNICAL REPORT 


Drafting of preliminary module ideas 


in 


The f 
discu 


rst meeting of the ICILS 2018 NRCs, included a reflection on the CIL test from ICILS 2013, 
ssion of the potential for assessing CT in ICILS 2018, and a module development workshop 


which participants were introduced to the questions and issues directing the creation and 


evalu 
module development and review process, were applied by test development staff at the ISC and 
reviewers alike. The following lists present the main review questions used to evaluate the ICILS 
modules: 


ation of the modules and tasks. These review criteria, which remained valid throughout the 


Content validity 


C 


How did the material relate to the ICILS test specifications? 

Did the tasks test the construct (CIL or CT) described in the assessment framework? 

Did the tasks relate to content at the core of the aspects of the assessment framework or 
focus on trivial side issues? 

How would the ICILS test content stand up to broader expert and public scrutiny? 


arity and context 


— Were the tasks and stimulus material coherent, unambiguous, and clear? 


— Were the modules and tasks interesting, worthwhile, and relevant? 


Did the tasks assume prior knowledge and, if so, was this assumed to be acceptable or part 
of what the test intended to measure? 

Was the reading load as low as possible without compromising the real-world relevance and 
validity of the tasks? 

Were there idioms or syntactic structures that may prove difficult to translate into other 
anguages? 


Test-takers 


Did the content of the modules and tasks match the expected range of ability levels, age, and 
maturity of the ICILS target population? 

Did the material appear to be cross-culturally relevant and sensitive? 

Were specific items or tasks likely to be easier or harder for certain subgroups in the target 
population for reasons other than differences in the ability measured by the test? 

Did the constructed-response items and the large-task information provide clear guidance 
about what was expected in response to the items and tasks? 


e Format and scoring 


Was the proposed format the most suitable for the framework content being assessed by 
each task? 

Was the key (the correct answer to a multiple-choice question) indisputably correct? 

Were the distractors (the incorrect options to a multiple-choice question) plausible but also 
irrefutably incorrect? 

Did the scoring criteria for the large tasks assess the essential characteristics of task 


completion? 

Were there different approaches to provide answers with the same score, and did they 
represent equivalent or different levels of proficiency? 

Was the proposed scoring consistent with the underlying ability measured by the test (CIL), 
and would test respondents with higher ability levels always score better than those with 
lower ones? 

Were there other kinds of answers that had not been anticipated in the scoring guides (e.g., 
any that did not fall within the “correct” answer category description, but appear to be equally 
correct)? 


ICILS 2018 TEST DEVELOPMENT 


— Were the scoring criteria sufficient for scorers and did they clearly distinguish the different 
levels of performance? 


At the module development workshop, NRCs were invited to discuss and suggest new ideas for 
module themes. These themes were taken as astarting point for subsequent module development. 


Development of module storyboards 


After the first NRC meeting, the ISC developed storyboards for six new modules (three for each 
of ClL and CT). 


The storyboards were presented in the form of a Microsoft PowerPoint mock-up of each task/ 
item in sequence. The tasks were presented in sequence so that those viewing the presentation 
could see the narrative sequence of tasks in the module. 


Each PowerPoint slide contained the stimulus material together with the task instructions or 
question that students would be required to respond to, and each storyboard was accompanied 
by a set of implementation notes for each task or question. The notes described the planned 
functionality/behavior of each task and provided instructions on how the task was to be scored. 
Instructions were provided not only for the human-scored tasks but also for the tasks that would 
be scored automatically by the computer system. 


First online review of module storyboards 
In December 2015, NRCs took part in an online review of the draft storyboards. When reviewing 
the draft storyboards, NRCs made recommendations relating to the modules. The NRCs’ feedback 
informed further revision of the module storyboards and preparation for detailed discussion of 
the modules at the second meeting of NRCs. 


Face-to-face review of draft module storyboards 


One focus of the second meeting of NRCs was to review the six draft module storyboards. 


Following this meeting, four module storyboards (two CIL and two CT modules) were selected 
for further development and revision. Feedback from the meeting was used to inform the further 
revision of the four module storyboards. 


Second online review of draft module storyboards 


Once the module storyboards had been revised following the second meeting of NRCs, the 
storyboards were made available online to NRCs for further comment. The feedback on these 
modules was used to finalize the draft storyboards to be enacted within the online delivery system. 


Authoring the draft storyboards in the field trial online delivery system 


The finalized storyboards were provided to IEA to be authored into the test delivery system, which 
meant they could then be viewed, in draft form, with their expected functionality. This process 
served two purposes: it enabled the draft modules to be reviewed and refined with reference to 
their live functionality and it enabled the functionality of the test delivery system to be tested 
and refined. 


Face-to-face review of field-trial modules and finalization 

Operational versions of the proposed field trial modules were made available to NRCs for review 
at the third meeting of NRCs. The content and functionality of the field trial modules were revised 
and finalized in response to this review. 


17 


18 


ICILS 2018 TECHNICAL REPORT 


Field-trial scorer training 

National center representatives attended an international scorer training meeting held before the 
field trial. These representatives subsequently trained the national center staff in charge of scoring 
student responses in their respective countries. Feedback from the scoring training process led 
to refinements to the scoring guides. 


Authoring the draft storyboards in the main survey online delivery system 


Following the field trial, the module content was provided to RM Results (formerly SONET Systems) 
to be authored into the main survey test delivery system. Further review and refinement of the 
content and function of the modules was conducted in this system. 


Field trial analysis review and selection of items for the main survey 


Field trial data were used to investigate the measurement properties of the ICILS test items at the 
international level and within countries. Having recommended which modules and tasks should 
be included in the main survey instrument, ISC staff discussed their recommendations with NRCs 
at the fourth ICILS NRC meeting. During the meeting, small refinements were recommended for 
a number of tasks. The NRCs also strongly recommended that all four test modules used in the 
field trial be retained for use in the main survey given that all four exhibited satisfactory validity, 
functioning, and measurement properties. 


Post-field trial revision 


inor modifications were made to a small number of tasks and the functionality of all tasks was 
further reviewed and refined. The main survey instruments, comprising five CIL test modules 
three trend modules, two newly developed modules) and two CT test modules, were then finalized. 


Main survey scorer training 


The main survey international scorer training meeting provided a final opportunity to reflect on 
the experience of scoring the field trial responses and to further review the scoring guides. The 
meeting was again attended by national center representatives who were responsible for training 
the national center staff in charge of scoring student responses in their respective countries. 
Feedback from the second international scorer training meeting along with student achievement 
data and the reported experiences of scorers during the field trial prompted further refinements 
to the scoring guides. 


Field trial test design and content 


Test design 


In this report we refer to tasks, items, and score points when describing the ICILS tests of CIL 
and CT. 


The term task refers to the instructions given to students and the actions required to complete them. 
As described previously, the ICILS test modules include a range of information-based response 
tasks, and skills, authoring, and coding tasks. 


The term item refers to the variable or variables derived using the scored student responses to 
each task that were used to create the scales of CIL and CT and to measure student achievement 
against them. Achievement on tasks, such as information-based response tasks, and skills and 
coding tasks, was typically measured using one item per task. Achievement on the authoring tasks 
was measured using many items (scoring criteria) per task. 


The term score points refers to the number of discrete non-zero score categories per item. The 
items associated with skills tasks typically elicited one score point each (e.g., correct = 1, incorrect 
=0) whereas most of the items associated with information-based response tasks, authoring tasks, 


ICILS 2018 TEST DEVELOPMENT 


and coding tasks elicited more than one score point (e.g., full credit = 3, partial credit (high) = 2, 


partial credit (low) = 1, and no credit = 0). 


The CIL field trial test instrument consisted of five test modules with a total of 51 tasks which 
yielded data for 86 items (all authoring tasks and a small number of skills tasks were assessed 
using more than one criterion, with each criterion constituting an item). The selection of tasks, 
including their format, was determined by the nature of the content the tasks were assessing, 
the tasks’ potential range of response types, and their role in the narrative flow of each module. 


The ICILS research team had earlier decided not to have the same balance of task types and task 
formats within each module but rather to have amix across the five modules of information-based 
response tasks, skills tasks, and authoring tasks. Table 2.3 shows the composition of the field trial 
CIL test modules by items derived from the different task types. Overall, 30 percent of the items 
were derived from information-based response tasks, 22 percent from skills task, and 45 percent 


from authoring tasks. 


The CT field trial test instrument consisted of two test modules with a total of 21 tasks yielding 


data for 19 items (Table 2.4).2 


Table 2.3: Field trial CIL test module composition by items derived from tasks 


Band competition 1 2 2 2 3 7 17 

Breathing O 3 O 3 O 10 16 

School trip 3 O O 4 1 7 15 

Board games 3 4 O 1 O 6 14 

Recycling* 3 5 O 1 4 11 24 

Total 10 14 2 41 8 41 86 
Note: *The recycling module included a short note-taking task and a communication task. In this table both are 


classified as authoring (large) tasks. 


Table 2.4: Field trial CT test module composition by items derived from tasks 


Automated bus 1 1 4 2 O 0 8 
Farm drone 0 2 1 0 6 2 44 
Total 1 3 5 2 6 2 19 


2 Data for two tasks were not used in the field trial scaling analysis. 


20 


The test modules we 
this design. In total, there were 60 differen 
ting the test of CIL on 


There we 
be admini 


by the stu 


ncountri 


for students comple 
the tests of ClLand CT). 


ICILS 2018 TECHNICAL REPORT 


re 20 possible combinations of the two CIL modules selected from the pool of 1 


dent questionnai 


es comp 


combinati 


eting the 
ons followed by t 


stered to all students. 
in the first position and fou 
component of ICI 


r tim 
LS, each student comple 
re. 


Each modu 


CT option, each 


of two possible sequences. This combinatio 


one of two possib 


complete. 


e CT seq 


uences resulted in 40 modu 


The term test form in Table 2.5 refers to each combination 


Table 2.5: Field trial test form design and contents 


le appeared in eight module combinations (four 
es in the second position). In countries completing only the CIL 
ted one of the 20 CIL module combinations fol 


re delivered to students ina fully balanced complete rotation. Table 2.5 shows 
t possible combinations of modules (20 combinations 
y and 40 combinations for students completing both 


five to 
times 


owed 


student completed one of the 20 possible CIL module 
he student questionnaire and then both CT modules presented in one 
n of 20 ClL module combinations each presented with 


e combinations available for students to 


of modules that was used in the field trial. 


GIE CIL module position Field trial test form design 
(SelmT EB Ln Test form ClL combination | Questionnaire CT order 
2 1-20 1-20 Q = 
: i 21-40 1-20 Q AB:FD 
2 i 3 41-60 1-20 Q FD:AB 
3 B G 
4 B R 
S 
6 G 
7 R 
8 S G 
9 S R 
10 G R 
1 H B 
2 S B 
3 G B 
4 R B 
5 S 
6 G sage aces 
Vi R H: Breathing 
: G S ee 
9 R S R: Recycling 
AB: Automated bus 
20 R G FD: Farm drone 


ICILS 2018 TEST DEVELOPMENT 


Field trial coverage of the CIL framework 


All field trial items were developed according to and mapped against the ICILS CIL framework. 
Table 2.6 shows this mapping. 


Table 2.6: Field trial CIL item mapping to the CIL framework 


CIL framework aspect Items Items Score Score 

(n) (%) points points 
(n) (%) 
1.1 Foundations of computer use 4 5 | 4 
1.2 Computer use conventions 9 10 8 
2.1 Accessing and evaluating information 16 19 24 19 
2.2 Managing information 9 10 1 9 
3.1L Transforming information 14 16 23 18 
3:2 Creating information 23 27 40 ol 
41 Sharing information 7 8 9 7 
4.2 Using information responsibly and safely 4 5 6 5 
86 100 128 100 


As stated inthe ICILS 2018 assessment framework, “[t]he test design of ICILS was not planned to 
assess equal proportions of all aspects of the CIL construct, but rather to ensure some coverage 
of all aspects as part of an authentic set of assessment activities in context” (Fraillon et a 
54). The intention that the four strands would be adequately represented in the test was achieved. 
Twelve percent of score points related to Strand 1, 27 percent to Strand 2, 49 percent to Strand 


3, and 


12 percent to Strand 4. These proportions corresponded to the amount of time 


stude 


2019, p. 


the ICILS 


nts were expected to spend on each strand’s complement of tasks. Aspects 2.1, 3.1, and 3.2 


were assessed primarily via the large tasks at the end of each module, with students expected to 


spend 


roughly two thirds of their working time on these tasks. 


Field trial coverage of the CT framework 


All fie 
Table 


2.7 shows this mapping. 


Table 2.7: Field trial CT item mapping to the CT framework 


d trial items were developed according to, and mapped against, the ICILS CT framework. 


CT framework aspect Items Items Score Score 
(n) (%) points points 
(n) (%) 
LA Knowing about and understanding 3 16 7 16 
digital systems 
12 Formulating and analyzing problems 3 16 5 12 
13 Collecting and representing relevant 3 16 4 9 
data 
2.1 Planning and evaluating solutions 4 21 10 23 
2.2 Developing algorithms, programs, 6 32 d7 40 
and interfaces 
19 100 43 100 


21 


22 


As stated in the ICILS assessment framework, “[t]he test design of 


assess equal proportions 


all aspects as part of an authentic set of asse 


Table 2.6 shows that, whi 
of the CT framework,am 


of all aspects of the CT construct, but ra 


ssment activities in context” ( 


ICILS 2018 TECHNICAL REPORT 


ICILS 
ther to 


2018 was not planned to 
ensure some coverage of 
Fraillonet al. 2019, p. 54). 


le there was asimilar number of items addressing each of Strands 1 and 2 


ajority (63%) of score points collected re 


the quality of students’ responses to algorithm construction and a 
farm drone module) was typically assessed using partial credit cri 


each) that combined measures of the accuracy and elegance of students’ 


gorith 
teria (o 


ated to Strand 2. This is because 


m debugging tasks (inthe 
f up to three score points 
coding solutions for each 


task. Detailed information about scoring the algorithm construction anda 


has been included in the section in this chapter describing the main 


Selection of CIL and CT test content for main survey 


As stated previously, the 
the ICILS test items, and 


gorithm debugging tasks 


survey test design and content. 


field trial data were used to investigate the measurement properties of 
ISC staff made recommendations on which modules and tasks should be 


with the task and if there 


Feedback from NRCs an 


the automated bus mod 
agreed that the time allo 


Also as mentioned previously, SC staff recom 


task’s measurement properties. After consu 
fleld-trial test modules and the two CT test modules for use in th 


included in the main survey instrument. Chapter 11 of this report describes the analysis procedures 
used to review the measurement properties. 


mended refinements to asmall number of tasks and to 
the scoring of some tasks. A task was only refined when data provided clear evidence of a problem 


was strong agreement that the refinement would very likely improve the 


d observers of the field trial suggested 


farm drone module wou 


use in the main survey. 


Main survey test design and content 


Test design 


The main survey test instrum 


emain 


ting with NRCs, the ISC decided to retain all five CIL 


survey. 


that the total testing time was 


ong for students completing both the CIL and CT tests and that many students were finishing 
ule in far less than the allocated 30 minutes. As a result of this, it was 
cated to each CT module would be reduced to 25 minutes and that the 
d be shortened by removing two final constructed response items that 
required students to provide reflections on coding solutions. 
drone module would benefit from an increase in the number of 
algorithm construction tasks from the field trial were converted 


t was also agreed that the farm 
algorithm debugging tasks. Two 
to algorithm debugging tasks for 


ent consisted of five test modules with a total of 46 tasks which 


yielded data for 81 items. Some of these tasks generated a number of score points based on the 


criteria that were applied to the large tas 


s. Table 2.8 shows the composition of the main survey 


test modules by items derived from the different task types. The items shown in Table 2.8 are those 


that were used in the scaling a 


nd analysis of the ICILS 2018 main survey ClL test data. Overall, 27 


percent of the items were derived from information-based response tasks, 22 percent from skills 


task, and 51 percent from authoring tasks. 


The CT main survey test 
yielded data for 17 items 


derived from skills tasks (Table 2.9). 


Students received the test modules in th 


instrument consisted of two test modules with a total of 17 tasks which 
. The CT test emphasized the application of skills with 15 of the 17 items 


esame fully balanced complete rotation that was used in 


the field trial. Table 2.10 shows this design for the main survey. As before, the term test form refers 
to each combination of modules used in the main survey. 


ICILS 2018 TEST DEVELOPMENT 23 


Table 2.8: Main survey CIL test module composition by items derived from tasks 


Band competition 1 2 2 2 3 7 17 

Breathing O 2 O 3 O 10 15 

School trip 1 O 0 4 1 7 13 

Board games Z 4 O al 1 5 13 

Recycling* 3 5 O 1 2 12 23 

Total 7 13 2) 14 7 41 81 
Note: *The recycling module included a short note-taking task and a communication task. In this table both are 


classified as large tasks. 


Table 2.9: Main survey CT test module composition by items derived from tasks 


Automated bus 1 1 4 2 O O 8 


Farm drone 


Total | 1 1 5 | D 5 a 17 


24 


Table 2.10: Main survey test form design and contents 


ICILS 2018 TECHNICAL REPORT 


Elle CIL module position Main survey test form design 
Comes ln Test form ClL combination | Questionnaire CT order 
: e 1-20 1-20 Q - 
i i 21-40 1-20 Q AB-FD 
2 ‘ : 41-60 1-20 Q FD:AB 
3 B G 
4 B R 
5 S 
6 G 
7 R 
8 S G 
9 S R 
10 G R 
i1 H B 
12 S B 
13 G B 
14 R B 
15 Ss 
6 G eae 
L7 R H: Breathing 
c : — 
19 R S R: Recycling 
AB: Automated bus 
20 R G FD: Farm drone 


Main survey coverage of the CIL framework 


All main survey items were developed according to and mapped against the ICILS CIL framework. 
Table 2.14 shows this mapping. 


Table 2.11: Main survey CIL item mapping to the CIL framework 


CIL framework aspect Items Items Score Score 
(n) (%) points points 
(n) (%) 
1.1 Foundations of computer use 2 iz 2 2 
1.2 Computer use conventions 11 14 13 13 
2 Accessing and evaluating information 14 17 16 16 
2.2 Managing information 8 10 8 8 
3.1 Transforming information 15 19 20 20 
3.2 Creating information 21 26 31 30 
4.1 Sharing information 7 9 8 8 
4.2 Using information responsibly and safely 3 4 4 4 
81 100 102 100 


ICILS 2018 TEST DEVELOPMENT 


A comparison of Tables 2.6 and 2.11 reveals that the final test instrument provided very similar 
CIL framework coverage to that of the field trial instrument. 


The 81 main survey items yielded 102 score points for inclusion in the item analysis and scaling. 
Overall 15 percent of score points were derived from items corresponding to Strand 1, 24 percent 
to Strand 2, 50 percent to Strand 3, and 12 percent to Strand 4. 


Main survey coverage of the CT framework 


All main survey items were developed according to and mapped against the ICILS CT framework. 
Table 2.12 shows this mapping. 


Table 2.12: Main survey CT item mapping to the CT framework 


CIL framework aspect Items Items Score Score 
(n) (%) points points 
(n) (%) 
11 Knowing about and understanding 3 18 7 18 
digital systems 
1.2 Formulating and analyzing problems 2 12 4 10 
13 Collecting and representing relevant 3 18 5 13 
data 
2.1 Planning and evaluating solutions 5 29 12 eal 
22. Developing algorithms, programs, 4 24 11 28 
and interfaces 


17 100 39 100 


A comparison of Tables 2.7 and 2.12 reveals that the final test instrument provided similar CT 
framework coverage to that of the field trial instrument. The relative decrease in Strand 2.2 
coverage and increase in Strand 2.1 coverage between the field trial and main survey instrument 
was largely a result of a conversion of two algorithm construction tasks in the field trial to become 
algorithm debugging tasks in the main survey. Other small changes in the coverage by aspect 
related to the removal of two constructed response evaluation tasks from the farm drone task 
following the field trial. 


The 17 main survey items yielded 39 score points for inclusion in the item analysis and scaling. 
Overall 41 percent and 59 percent of score points were derived from items corresponding to 
Strands 1 and 2 respectively. 


Scoring the main survey CT algorithm construction and algorithm debugging tasks 

CT items 

The CT test was new to ICILS and represented the first time CT has been assessed in a cross- 
national large-scale assessment content. The quality of students’ responses to visual coding tasks 
has not previously been assessed in this context and consequently we have included this section 
in the technical report to explain the conceptual basis and operationalization of the scoring of 
students’ visual code algorithms in ICILS 2018. 


The algorithm construction tasks required students to select commands (available as visual code 
blocks) and place them in sequence in order for the farm drone to complete specified tasks. The 
tasks included any or all of the following actions: having the drone move to specified locations ona 
grid, having the drone drop any of water, seed, or fertilizer, and applying conditional logic relating 
to the size of the crops on which an action may be conducted (using a simple but configurable “if... 


26 


do” command). 


ICILS 2018 TECHNICAL REPORT 


After two simple warm-up tasks, students were also introduced to and able to 


make use of a configurable “repeat” command used to complete multiple iterations of commands. 


The algorithm debugging tasks required students edit an existing configuration of commands 
in order for the farm drone to complete specified tasks. The algorithm debugging tasks were 


Z 


developed such that one or two minor modifications to the command configuration could result in 


the drone comp 


leting the actions as required. However, there were no restrictions in the debugging 


tasks on the nature of the changes students could make to the commands and their configuration. 
Students could, if they wished, remove all the existing commands and develop a completely new 


set of code. 


The tasks beca 


me progressively more complex as students worked through the module. Task 


complexity related to the number of actions needed to be completed by the drone and the number, 
type, and configuration of targets onto which the drone needed to drop any of water, seed, or 


fertilizer. 


The quality of students’ coding solutions to both algorithm construction and debugging tasks was 


conceptualized 
givensolutiona 


criterion, it was 


ultimately gene 


for each student 
of coding were determined by the research team, the individual scores for student responses were 


in terms of two main criteria. The first criterion related to the correctness of any 
nd the second was the efficiency (or elegance) of the solution. Ultimately we derived 


measures for each criterion and then combined these into a single measure of the quality of the 
solution to each coding task. As we defined, specified, and operationalized the measures for each 


possible to have the computer-based delivery system generate the relevant data 
response. So while the identification of variables and scores to measure the quality 


rated through the test delivery system. The method for scoring each criterion and 


then combining the scores to form a single score for each task is described below. 


Scoring the correctness of coding solutions 


these tasks, the 


target with c 


comprised two variables (correct targets and incorrect targets): 


The correctness of a solution was characterized by the degree to which the behavior of the drone 
in response to a student’s coding solution matched the required behavior of the drone specified 
in the task instructions. In a small number of simple tasks, the instruction was that the drone fly 
to aspecified location. For these tasks, the correctness was measured only in terms of whether or 
not the drone ended up at the specified location. For the majority of tasks the drone was required 
to both fly to specified locations and drop any of water, seed, or fertilizer on these targets. For 


correctness of the solution was initially scored according to three sub-criteria that 


i. the number of targets with the correct materials on them (correct targets: 2 score points per 


orrect materials) 


ii. the number of targets with incorrect materials on them (correct targets: 1 score point per target 
with incorrect materials) 


iii. whether any materials were dropped on a square that was not a target (incorrect targets: 1 


score point for no materials on any incorrect targets, O score points for any materials on any 


incorrect tar 


get). 


Each response was first scored for each variable (correct targets and incorrect targets). Following 
this, the frequencies of the scores were examined together with consideration of the descriptions of 
combinations of these scores. This was an iterative process used to establish a scoring logic for the 


correctness on 


each task. This scoring logic included combining the correct targets and incorrect 


targets scores into asingle initial combined correctness score for each task. Once the scoring logic 
for the correctness of each task was established, we completed a Rasch item response theory 


scaling analysis 


(Rasch 1960) (see Chapter 11 for further detail) to establish the measurement 


viability of the scoring logic and, where appropriate, we recoded scoring categories for selected 


ICILS 2018 TEST DEVELOPMENT 


items. Table 2.13 shows an example operationalization of this scoring logic for a task, together 
with the recoding of categories following initial scaling. For each task, the score for the completely 
correct response remained as a unique and highest score category. The recoding combination 
of score categories for partially correct responses varied across tasks and was informed by the 
response frequencies, the substantive differences in response quality, and the degree to which 
these differences were reflected in the student data. 


Table 2.13: Example scoring of the correctness of student coding responses 


27 


Response description Targets Incorrect Initial Recoded 

(combining targets and incorrect targets) score targets combined combined 
score score score 

All 4 targets reached with only correct materials 8 1 > 4 2 

and no incorrect targets 

All 4 targets reached with only correct materials 8 O = 3 Al 

and one or more incorrect targets 

At least 2 targets with only the correct materials OR 1 2 1 

All 4 targets with the incorrect material and 4,5, 6, or 7 > 

no incorrect targets 

At least 2 targets with only the correct materials OR 4,5, 6, or 7 0 1 O 

All 4 targets with the incorrect material and one or => 

more incorrect targets 

Fewer than 2 targets met Fewer than 4 Oor1 > O O 


Scoring the efficiency of coding solutions 


The second cr 


terion used to score the s 
elegance) of students’ responses. We explored two main approaches when consideri 


each task in the farm 
in their soluti 
most common 
using one 


latter solution which 


also possible, 
A repeats) wi 
For each tas 
corresponded to dif 


it was 


requirements, the like 


The second approach 
their code solutions, such 
the elegance of the subs 


phase of the analysis we in 


commands were used. 


is the 


possi 


efficiency of the code in students’ respo 
commands in each response. For each tas 
included the smallest number of commands 
drone module studen 
on). For example, if a task required the drone to move forward five times, th 


ly achieved by students using either five “move forward” commands i 


“repeat” command (conf 


gured to 5 repeat 


was to examine the degree to which 
as loops within loops. This was considered for its potenti 
tance of a coding solution. For the more complex tasks 
to observe some variation in the degree to which students embedded commands 
uded an elegance variable derived from the degree to w 


Cc 


Once we had esta 


blished 


most effi 
such as to use one “move forward” command a 
th an additional “move forward” command. This solution would use three commands. 
fy clusters of the number of commands that most 
For each task we created an efficiency 
the solution and informed by the 
dbe associated wi 
ber of commands across the respon 


ble to identi 
ferent approaches 
score hierarchy based on the number of 
y characteristics of soluti 
of commands, and the frequencies of the num 


to solving the task. 
commands used in 
ons that cou 


nda “repeat” com 


th dif 


nses. The first was a simple count of the number 
the most efficient response was defined as 0 
needed to complete a completely correct response (for 
ts were instructed to use as few commands as poss 


s) and one “move forward” command. 
cient, uses two commands. In theory other configurations are 
mand 


bedd 


students used em 


tudent coding responses related to the efficiency ( 
ng the 


nN sequence 


configured 


SES: 


ed logic w 


and in anea 


efficiency scores for responses to each task and elegance scores 
responses to the more complex tasks, we included these data in the scaling model (see Chapt 


hich embedd 


or 


of 


ne that 


ible 
is Was 


or 
he 


to 


ikely 


task 
ferent numbers 


ithin 
al to represent 
it was possib 


e 


rly 
ed 


for 


28 ICILS 2018 TECHNICAL REPORT 


11). We found that the data from the elegance variable did not contribute to the quality of the 
measurement of student CT and at this point decided to use only the efficiency data to contribute 
to the scoring. Table 2.14 shows an example operationalization of the scoring logic in establishing 
the efficiency scores for a task, together with the recoding of categories following initial scaling. 


Table 2.14: Example scoring of the efficiency of student coding responses 


Comment on efficiency Commands Responses (%)* Code 
Minimum necessary commands 6 28 > 3 
Repeat with additional commands 7-10 16 > 2 
14 commands was the minimum necessary if not using the repeat 11-15 41 > 1 
More than 15 commands showed a good deal of redundancy >15 6 > 0 


Note: * These percentages include all responses using the given number of commands regardless of their correctness. 


Combining the correctness and efficiency scores to establish a single score for each coding 
task 

The final step of establishing scores for each coding task was to combine the correctness and 
efficiency scores. This was done by using the efficiency score for each task to adjust the correctness 
score. However, we decided that the efficiency of a solution should only be used to adjust the 
highest solution score for each task. Operationally this meant that the efficiency of code was not 
considered to be important in evaluating the quality of incorrect or partially correct responses, 
but that it was considered to be important in identifying differences in the quality of fully correct 
responses. 


We created anew composite variable (correctness and efficiency) for each coding task. For this 
variable, the score for fully correct responses was adjusted such that: 

e Fully correct responses with the highest efficiency score were adjusted to one score point 
higher than the highest correctness score; 


e Fully correct responses with a partial credit (moderate) efficiency score were not adjusted; 
and 


e Fully correct responses with a zero efficiency score were adjusted to one score point lower 
than the highest correctness score. 


The correctness scores for all partially correct or incorrect responses were unchanged when 
included inthe correctness and efficiency composite variable. Table 2.15 shows an example of how 
correctness and efficiency scores were combined to form the single composite variable. 


Released CIL test module and CT tasks 


One CIL test module, band competition, has been released since publication of the ICILS 
international report (Fraillon et al. 2020). This module required students to work on asequence of 
tasks associated with planning a website to promote a band competition within a school. The large 
task for this module required students to add content to a page on the band competition website. 


The large task in this module presented students with a description of the task details as well as 
information about how the task would be assessed. The description was followed by a short video 
designed to familiarize students with the task and highlight the main features of the software they 
would need to use to complete the task. A detailed description of the module appears on pages 60 
to 74 of the ICILS 2018 international report (Fraillon et al. 2020). 


ICILS 2018 TEST DEVELOPMENT 


Table 2.15: Example scoring of correctness and efficiency combined 


Correctness | Efficiency Conceptual description of correctness and efficiency Recoded 
score score combined 
score (CE) 
2 3 All 4 targets using only the correct material with no 3 


irrelevant targets using no more than 6 commands 


2 21 All 4 targets using only the correct material with no 2 
irrelevant targets using between 7 and 15 commands 


2 O All 4 targets in one row reached using only the correct 1 
material with no irrelevant targets using more than 
15 commands 
1 N/A At least 2 targets with only the correct materials and 1 
no irrelevant targets using any number of commands 
OR 
All 4 targets with the incorrect material and no irrelevant 
targets using any number of commands 1 
0 N/A Fewer than 2 targets met O 


Four tasks from the CT test have also been released. Two tasks were taken from the automated 
bus module and show the use of a simulation configuration of a decision tree. Two tasks were 
taken from the farm drone module and show algorithm construction and algorithm debugging. 
Detailed descriptions of these tasks appear on pages 974 to 101 of the ICILS international report 


(Frai 


llon et al. 2020). 


References 


Frail 


Frail 


on, J., Ainley, J., Schulz, W., Duckworth, D., & Friedman, T. (2019). IEA International Computer and 


Information Literacy Study 2018 assessment framework. Cham, Switzerland: Springer. https://www.springer. 
com/gp/book/978303019388 1 


on, J., Ainley, J., Schulz, W., Friedman, T., & Duckworth, D. (2020). Preparing for life in a digital world: IEA 


International Computer and Information Literacy Study 2018 international report. Cham, Switzerland: Springer. 

https://www.springer.com/gp/book/9 783030387808 

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: 
ielsen & Lydiche. 


29 


CHAPTER 3: 


Computer-based assessment systems 


Julian Fraillon, Ralph Carstens, and Sebastian Meyer 


Introduction 


ICILS 2018 collects student achievement and questionnaire data on compu 
paper. This chapter describes the key aspects of the computer-based test delivery 
in ICILS 2018. It also details some of the challenges of using computer-based syste 
nacrossnati 


student data i 


The focus of the chapter is on the overall approach, architecture, and design of t 
suite as well as the relationship of these to other technical systems 


assessment (CBA) system 


used for the s 


urvey opera 


assessment and test desig 
in Chapter 8, data flow and 


in Chapter 11 


ter ra 


onal large-scale assessment such as ICILS 2018. 


ne com 


tions. Details and procedural aspects are provided 
nin Chapter 2, translation and adaptation in Chapter 5, fiel 


ICILS 2018 computer-based components and architecture 


Pre-existing components 


ther than on 
system used 
ms to collect 


puter-based 


in other chapters: 


doperations 


integration in Chapter 10, and scaling and analysis of the test materials 


EA has been using computer-based systems for anumber of years in order to organize and support 


countries to coordinate field operations and collect questionnaire data from teachers and schools. 


These system 


[he IEA Da 
questionna 


occupation 


a 


All 
cus 


five so 


tomization 


Data from 
integrated 
informatio 


in 
no 


As in ICILS 2013, RM Results (previously SONET Systems) provided the software compon 
tion of student data via computer for ICILS 2018. These software 
der suite known as AssessmentMaster (AM). The software modules 
ivery of the ICILS 2018 student test and questionnaire include an 


directly relati 
components a 
specifically re 


s, developed 


The IEA Windows Within-Sch 
manage within-sc 
The IEA Online S 
subsequent delivery of computer-based questionnaire material; 
agement Expert (IEA DM 


rvey 


ta Man 
irema 


terial 


-and 


The IEA Translation System, which is used to translate and review the school, ICT 
nd teacher questionnaires online. 


ftware components were used in ICILS 2018 once they had undergo 


s and/or ext 


paper- and computer-based test and ques 
the pre-existing IEA 
n this process. 


ng to the col 
repartofab 
ating to the 


administration module (A 


module (AM 


Examiner), anda 


Three ICILS 2018 student 


referred to as 


trend modu 


hool sampling and test administra 


by IEA, include the following: 
EA WinW<3S), which supports 


tion procedures; 


ool Sampling Software (I 


System (IEA OSS), which supports the translation, ada 


E), which is used to capture data from 
and to integrate and verify national databases; 


n 
ension so that they would suit the specific study context. 


tionnaire components were tran 
Expert software. Chapter 10 pr 


Data Processing 


ec 
roa 
de 

anager), a translation module (AM Designer), a del 
scoring module (AM Marker). 


CeS 
eS; 


t modules were already administered in ICILS 2013. They are us 


countries to 


ptation, and 


paper-based 


The [EA Coding Expert, which is used to code the student responses with respect to parental 


coordinator, 


e significant 


sformed and 
ovides more 


ents 


ivery engine 


ually 


32 


Components developed for ICILS 2018 
Apart from general updates and improvements in all software packages used in ICILS 2018, four 
new test modules were introduced. Two modules for the computer and information literacy (CIL) 
test and the two new modules comprising the computational thinking (CT) test. 


Procedures an 


A series of detailed man 


manuals are kn 
with the entire 


d software user manuals 


SOP materials. 


uals guides the work of national center s 
own as Survey Operations Procedures (SOP) manuals. | 
international team, has developed and refined instr 
WinWSS, IEA OSS, IEA DME, and the IEA Translation System and 
Each user manual is tailored to the needs of each proj 


During ICILS 2018, a separate user manual was developed for 
instructions on how to use the scoring and administration modules were included within the SOP 


materials. The manuals relating to 


ICILS 2018 TECHNICAL REPORT 


taff during IEA studies. These 
EA, in close cooperation 
uctions and guidelines for IEA 
included these in the suite of 
ect (see Chapter 8 for details). 


the translation module, while 


the delivery engine module were incorporated in the manuals 


for school coordinators and test administrators who were administering the tests in schools. 


ICILS 2018 system architecture 


Table 3.1 provides an overview of t 


he ICILS 2018 computer-based system and its supporting and 


accompanying systems. The table also shows how these systems interfaced and interacted. The 


table is organiz 


ed by the major com 


ponents of the system and the ICILS 2018 target populations. 


Ascanbe seen, ICILS 2018 used a mixture of pre-existing and newly developed and/or customized 
components and tools to support the preparation, administration, and post-processing of the study. 


Developing the computer-based delivery platform for ICILS 2018 


Developing the delivery engine 


Development of the computer-based delivery platform took place in paralle 
(see Chapter 2 for details of the test development). The three trend test 


administered i 
some changes i 
functioning of t 
in 2018 incom 


n ICILS 2013, were already availab 
nthe delivery system, AM Examiner, 


parsion to 2013. Two new CIL modu 


ICILS 2018. Once the new test modules’ storyboard 


Results for aut 
necessary func 
Each task unde 


e and largely 


horing into the delivery platform. Th 


rwent many such iterations. 


tionality of each task inthe test was reviewed and ret 


ready- 


| modu 


with test development 
modules, i.e., the ones 
to-use. In response to 


RM Results updated aspects of the backend 
he trend modules. These changes did not affect the user-experience of the modules 
es and two Cl 
s had been completed, they were sent to RM 
iS process was an iterative one wherein the 
fined once it had been enacted. 


es were developed for 


The test delivery engine was web-based, which meant that technically the tests could be delivered 


over the internet. However, the ICI 


tests should be 


Windows-base 


8) 


nly through Fi 


LS 2018 research team decided, for operational reasons, that the 


administered on USB drives (one per computer), each containing a self-contained 
web-server to deliver the web-based test content. Although the USB-based delivery engine was 


d, it could be run using some forms of Windows emulation 


The USB hosted a por 
materials were developed, rendered, and tested using Mozilla Firefox an 
research team recommended that the translation, scoring, and administration modules be accessed 
refox (although Google Chrome was also an option). As pa 
procedures, national centers needed to ensure that session data from each USB drive were 
uploaded to a central database as soon as practicable after each test sessi 


table version of the Mozilla Firefox browser. Accor 
d Google Chrome. The 


on Mac OS X or Linux. 


ingly, all ICILS 2018 


OQ: 


rt of the field-operation 


on. 


COMPUTER-BASED ASSESSMENT SYSTEMS 


Table 3.1: ICILS computer-based or computer-supported systems and operations 


33 


Process 


Student test and 
questionnaire 


Principal, ICT coordinator, 
and teacher 
questionnaires (online) 


Principal, ICT coordinator, 
and teacher 
questionnaires (paper) 


Authoring/master 


AssessmentMaster 


EA OSS admin module, 


Microsoft Word, direct 


instruments authoring from storyboards | transfer from IEA authoring of paper-based 
Translation System questionnaires 
Translation AM Designer, web-based EA Translation System Microsoft Word, desktop- 


based 


Sample selection and 
D provisioning 


EA WinW3S 


Delivery systems 


AM Examiner 


EA OSS delivery module, 
web-based 


Personalized questionnaire 
prints 


nitialization 


Student ID, password, and 
anguage information from 
EA WinW23S 


Respondent ID and 
password from IEA 
WinW3S 


Labels with respondent ID 
rom IEA WinW3S 


Primary delivery mode 


USB sticks: Windows-based 
on local school computers; 
proctored 


Any internet browser: 
self-administered 


Paper: self-administered 


Alternative delivery mode(s) 


Laptop server mode or 
carry-in laptop sets (where 
school infrastructure 
insufficient) 


Paper questionnaires 
(where infrastructure 
insufficient) 


/A 


Data capture 


Directly onto USB sticks; 
alternatively, local laptop 
servers (student by student) 


Directly into central 
database (all respondents) 


Manual (human) data 
using IEA DME software 
after administration (all 


respondents) 
Data merging Upload to AM Manager N/A Merging from multiple IEA 
(across respondents) DME databases (where used) 
Data scoring AM Marker N/A N/A 


Data management 


IEA WinW3S; crosscheck of expected versus available data 


An alternative delivery option was made available for use in the main survey. Under this system, 
the delivery engine was installed on anotebook computer connected to a school LAN. The school 
could then access the test locally over the school LAN via Mozilla Firefox or Google Chrome. 


School coordinators were provided with the following list of minimum specifications for computers 
to run the ICILS student test from USB sticks. 


e Screen resolution of at least 1024x768; 


e Oneof the following operating systems: Windows XP/Service Pack 3/Windows Vista/ Windows 
7/\Windows 8/Windows 10; 


e Display (fonts) set to the optimal size (minimum: 100%; the system font size can be set from 
the computer's Control Panel/Appearance and Personalization/Display); and 


e AUSB Port 2.0 or higher. 


School coordinators were provided with the following list of minimum specifications for computers 
to run the ICILS student test using the local server method (LAN). 


34 


ICILS 2018 TECHNICAL REPORT 


Requirements for the server computer: 


Screen resolution of at least 1024x768; 


One of the following operating systems: Windows 7/Windows 8/Windows 10* (*Windows 10 


was recommended for the bes 


Arecommended CPU of 15 


t experience); 


O00 MHZ; 


An installed System Memory of at 


least 4 GB; 


At least 10 GB of free storage space on the hard drive; 


LAN or Wi-Fi connectivity; and 


Internet connectivity (only req 


uired for data upload). 


Requirements for the client computers: 


Screen resolution of at least 1024x768; 


Connectivity to the same LAN or Wi-Fi network as the server computer; 


Recent version (less than 12 months old) of the Mozilla Firefox or Google Chrome browser 


installed; and 


the computer's Control Pan 


The delivery module was deve 


Delivery of multiple choice, 


responsiveness); 


Delivery of linear and nonli 


Delivery of large/authoring 


between and within pages; 


Capture of all student final respon 
Capture of time taken on each task; 


Facility to score student response 


A progress display (number 


tasks ( 


format text in a range of formats); 


Display of closed web environme 


of tas 


Display (fonts) set to the optimal size (minimum: 100%; the system font size can be set from 
el/Appearance and Personalization/Display). 


oped to include the following features: 
constructed response, and drag and drop items; 


near skills tasks (based on software simulations with real-world 


ive software applications with functionality to add, edit, and 


nts with the facility to display multiple pages and navigate 


ses to each task; 


s to skills tasks; 


s completed and to come per module) for students; and 


A countdown timer for students per module. 


Developing the translation systems 


In ICILS 2018 two different translation systems were used: the IEA Translation System and AM 
Designer. This corresponds to the fact that two delivery systems for the electronic administrations 
of ICILS 2018 instruments were also used. The school, teacher, and ICT coordinator questionnaires 
were administered via the IEA OSS (if not administered on paper) while the student tests and 


questionn 


Both tran 
needed th 


Development of t 


aires were administered using AM Examiner. 


slation systems are web-based applications accessed through a web-browser. Users 
e following in order to access it: 


A computer with a broadband or equivalent high-speed internet connection; and 


Aweb browser, e.g., Mozilla Firefox or Google Chrome. 


he translation systems provided the following features: 


Some selective functionality relating to the role of the user (e.g., administrator, translator, 
translation reviewer, translation verifier, verification reviewer); 


COMPUTER-BASED ASSESSMENT SYSTEMS 


e Opportunity to enter translated text for all screen elements; 


e Ability for those countries where the test would be administered in more than one language to 


select the target language; 


e Ability to view the translation history of a text element; 


e Ability to view the source text, the translated text, and any previous revisions of the translated 


text, and to compare translated versions; 


e Acomments interface to allow translators, reviewers, and verifier 
to elements of text; 


s to enter comments linked 


e Opportunity to view “live” translated versions of the tasks at any point during the translation 


and to simultaneously view (in a separate window) a live version of the so 


e Ability to enter text as plain text, including editable HTML tags or to ente 


tools to format text; 


e Ability to view translated text elements as plain text or as rendered HTML; 


e Ability to search for and/or selectively bulk-replace text within and across tasks; 


urce task; 


r and use formatting 


e Ability to bulk-import the field trial translations so that they could be used as a starting point 


for main survey translations; and 


e¢ Opportunity for translators, reviewers, and verifiers to monitor the progress of task completion 


i.e., translation state). 


Both systems included the functionalities listed above. The school, teacher, and ICT coordinator 
questionnaires were translated and verified within the IEA Translation System while the student 


instruments (test and questionnaires) were translated and verified w 


Developing the scoring module 
The ICILS 2018 scoring module, AM Marker, was an adaptation of the 


did not need to be translated. (ICILS 2018 did not require scorers to 


the application, users needed: 


be proficient in 


e Acomputer with a broadband or equivalent high-speed internet connection; and 


e Arecent version (12 months old or less) of Mozilla Firefox or Google Chrome. 


AM Marker was developed to include the following features: 


ithin AM Designer. 


CILS 2013 scoring module. 
This adaptation included the development of some new features for use in ICILS 2018. It also 
included replacing (where possible) text in the user interface with icons so that the user interface 


English.) 


The web-based scoring application could be accessed through a web-browser. In order to access 


e Selective functionality relating to the role of the user (e.g., administrator, scoring trainer, team 


leader, scorer); 


e Facility to specify a proportion (in ICILS 2018, 20%) of student responses to be blind double- 


scored for the purpose of monitoring inter-rater reliability; and 


e Capacity to begin scoring before all student responses had been uploaded to the system (i.e., 
before completion of data collection in schools) without compromising the double-scoring 


procedure. 


Scorers were able to: 


e View tasks (as they appeared to students in the test) along with student responses on screen; 


e Enter ascore for each student for each task; 


e Flag pieces of work for follow-up with a more senior staff member 


, 


e Navigate back to previously scored pieces of work and amend scores; 


36 


e View large tasks with ful 


ICILS 2018 TECHNICAL REPORT 


functionality; and 


e Enter a “training” mode to score pre-scored work and receive feedback on scoring accuracy. 


In addition, team leaders could: 


e Review (check-score and 


amend) scorers’ scores; and 


e Monitor and respond to flagged pieces of work. 


In addition, scoring trainers could: 


e Select, order, and annotate responses for use in scorer training. 


In addition, scoring administrators could: 


e Allocate scorers to teams; and 


e View reports of interscorer reliability. 


Developing the administration module 


The ICILS 2018 administrati 
staff monitor the progress 


on module, AM Manager, was developed to (i) support national center 
of test sessions (i.e., data upload), create user accounts, and define 


roles for users of the other modules (scoring and translation), and (ii) export test-session data for 


importing into IEA WinW3sS. 


The administration module 


was a web-based application that users accessed through a web- 


browser. In order to access the application, users needed: 


e Acomputer with a broad 


band or equivalent high-speed internet connection; and 


e Arecent version (12 months old or less) of Mozilla Firefox or Google Chrome. 


Development of the administration module focused on ensuring that users could: 


e Create user accounts and allocate roles; 


e Download the software i 


e View national test sessio 


mage of the ICILS 2018 test (including the test delivery engine); 


n details; 


e Monitor national test session status; and 
e Export test session data for IEA WinWSS. 


Challenges with com 


Successful collection of data 
being able to provide respon 
can be recorded, organized, 
data, it is imperative that all 


Uniformity of test presenta 
present additional challeng 


puter-based delivery in ICILS 2018 


in large-scale computer-based surveys relies on individual participants 
ses on computers in controlled uniform environments from which data 
and stored for later use. When the data being collected is assessment 
respondents experience the tasks in an identical manner. 


tion and administration is easily assured in a printed test, but CBAs 
es arising out of variations in presentation, administration, and the 


utilized infrastructure. These issues can influence respondents’ performance and are especially 


pronounced in complex task 


s, such as those involving (simulated) multimedia applications or other 


types of richor interactive stimulus. These tasks place greater demands on delivery methods than 


standard, flat, stimulus material, and multiple-choice response options. 


The ICILS 2018 test-delivery platform, including its USB-delivery option, was designed to minimize 
the potential for variation in students’ test-taking experience. The on-screen appearance of the 


ICILS 2018 instruments co 
the translation and verificat 


uld be checked (as a feature of the translation module) throughout 
ion processes and was formally verified during the layout verification 


process. Chapter 5 of this report describes these processes in detail. 


COMPUTER-BASED ASSESSMENT SYSTEMS 


The data collected during IC 


systems for processing, weigh 


set of computer-based syste 
co 
each component are potent 
su 


and the interfaces between 


mprising interconnected co 


rvey coordinators needed 
components of the system (payi 
th 


LS 2018 needed to 


m 
thara 
ilure. 
oth th 


mponents wi 
| points of fa 
to monitor b 


a 


e components. 


It is not possible in this sec 


of in 
item 
af 
SU 
re 
an 


rv 
sp 
d 


Duri 
ed 


fitness for use. 


ng ICILS 2018, the ins 


an 


Table 3.2 presents aggregated percentages of these dispositions for all sampled students 
excluding any noncooperating schools 


information described above. In 
information, that is, the proportio 
their staff classified as technica 
level, whether due to a lack of coo 
the more serious factors influencing the robustness an 


the 
stead, 


tion of 


the focus | 


problems du 


eys. Those responsible for implementing a survey nationa 
onse and participation rates, especially as these are used as key indicators of success, quali 


pection of 


tatus of ea 


d the coding scheme included codes for technical problems before, during, or after 
sessions; participation in the test and questionnaire sessions were coded separately. 


. Percenta 


per-country proportions were similar. 


ting, and analysis. T 
components (as sh 


To ensure that the system worked successfully 


ng special attenti 


report to look at t 
nand typology of cases that 


peration, technical issues, 


unweighted yet cleaned data from all 14 parti 
ucationsystems (52,625 sampled students) yielded some interesting insights. Test admin 
were asked to code the participation s 


be extracted, transformed, and then loaded into 
hese tasks required the interaction of acomplex 
own in Table 3.1). As is the case with any system 


nge of specific functions, the interfaces between 


, 


vidua 
tored 


e integrity of the functioning of the indi 
on to ensuring results data were being s 


£ 


full ran 
ngs re 
then 


ge of sources and types o 
ating to one particular type 
ational survey coordinators 
tration. Nonresponse at the unit or 
or flawed administration, is one 
bility of inferences from sample 
ly therefore regularly monit 


ne 


s on key findi 


ring adminis 


dsta 


cipati 
istrators 
ng form, 
the CBA 


ch sampled student on a student tracki 


SO 
that the 


ges are given across all countries given 


Table 3.2: Unweighted participation status percentages across participants based on full database and 
original coding by test administrators (reference: all sampled students) 


Participation status Test Questionnaire 
Left school permanently 1.3% 

Parental permission denied 2.6% 4.2% 
Absen 6.4% 7.0% 
ncompatible or failed equipment before assessment 0.1% 

Technical failure during assessment 0.3% 0.1% 

USB stick lost or upload failed after assessment 0.1% 

Participated 89.2% 87.2% 

Total 100% 


Table 3.2 shows that test and questionnaire data were collected from 89.2 percent and 87.2 


perce 
questi 
Al, 


nt of sampled students respectively. The slightly lower percentage of data collected for the 
onnaire is a result of two factors: 
Students may have been given a break between the test and the questionnaire so some of 
the students who completed the test may not have returned to complete the questionnaire 
0.6% more students were absent for the questionnaire than the test). 


n some countries parents gave permission for their children to complete the test but not 
the questionnaire (resulting in a further absentee difference of 1.6%). 


38 


ICILS 2018 TECHNICAL REPORT 


In some cases the test adminstrators managed to resolve technical issues for the questionnaire 


session where time is not as critica 


the questionnaire session). 


as during the test session (0.2% fewer technical failures during 


Roughly 7.7 percent of sampled students overall had either permanently left school (1.3%) or were 


a 


bsent on the day of tes 


ting (6.4%). 


Please note that the above percentages are based on raw data from the datab 


may not match other fig 
nweighted data and as initially ass 


u 


to fur 
test or questionnaire ses 
as participation by the s 


fl 


of student data) correspond 


a 


Chapter 11d 
treated during the calibration and scaling o 


e 


An initial re 
(NRCs) indi 


tial) responses and shoul 


agged in the student tracking forms but eventually been resolved (as evide 


Iso possible that asmall number of technical problems went unnoticed by th 


ventually 


“i 


escribes how data with some residual tech 


n 
f test data. 


ase and consequently 


ures relating to participation rates. The presented proportions are from 
igned by test administrators. We therefore subjected these data 
ther adjudication. We identified some cases initially coded as a technical problem during the 
sion, which had (at least par 

tudents concerned. These c 


dhave been classified 


ases, in which a technical problem had been 


nced by the presence 


ed to incorrect flagging in the respective student tracking forms. It is 


etest administrators. 


ical failures (evidenced or assumed) were 


ew of asurvey activities questionnaire completed by national research coordinators 
cated no systematic failures or problems with the testing system, although technical 


issues were observed in comparable proportions before, during, and after the assessment. However, 


n 


jad) 


a 


n 
O 


one of the 
ternative 


one accou 


Some NRCs 
in light of su 
data collecti 


f incidenta 


as responsible 


on mode. Among 


ot participating because of (i 
absences (e.g., Si 


RCs indicated or reported a lack of suitable de 
for instances of school nonresponse. The fact that 
executed on a different computer (except for carry-in laptops that were uniformly configured) may 
nt for the small number of apparently unconnected errors obse 


ivery options (US 


did report the general challenge of attaining school cooperati 
rvey fatigue, indu 


trial actions, or other factors unrelated to th 


B plus the local server 
each assessment was 


rved. 


on and participation 
e assessment and its 


those schools that agreed to participate, the proportion of students 
) denied parental permissions, (ii) school leavi 


ng, or (iii) other types 


-listing) accounted for a level of nonrespon 


than that due to technical issues. 


Compared to 2013 the tech 
data that could not be uploaded seems to have been only a minor issue in ICILS 2018 (0.1% of all 
sampled students) while in ICILS 2013 this constituted the largest factor (0.9%). 


se many times larger 


nical problems, cases involving lost USB sticks, corrupted data, or 


CHAPTER 4: 


ICILS 2018 questionnaire development 


John Ainley and Wolfram Schulz 


Introduction 


In this c 


The stud 
background as well as use of and familiarity wit 
relating 
to complete a range of different tasks in schoo 
use of computers and ICT 


regarding the use of ICT in teaching, and their 


There were two school questionnaires: one com 


ICT. 


Anonline questionnaire, the national contexts survey, for th 
was designed to collect contextual information at the national (or sub-national) level abou 
characteristics of education systems, plans, policies for u 


ent questionnaire was designed to gather in 
h computers and computing. It included questions 
to students’ background characteristics, th 


and 


hapter, we describe the development of the ICILS 2018 questionnaires for students, 
teachers, school principals, |CT coordinators, and n 


ational research centers. 


formation about students’ personal and home 


nd use of computers and ICT 
nd their attitudes toward the 


eir experience a 
out of school, a 


insociety. The teacher qu 
perspectives on the school environment for ICT use, their confidence in 
use of ICT for teaching and learning. 


pleted by the school princ 
the ICT coordinator. School principals were asked to report on ICT u 
their school, school characteristics, and teachers’ professional development in using ICT at school. 
The ICT coordinator questionnaire included questions about the availability of ICT resources at 
school (e.g., infrastructure, hardware, and software) as well as pedagogical support for the u 


estionnaire was designed to gather teachers’ 


using ICT, their attitudes 


ipal and one completed by 
se in learning and teaching at 


se of 


RCs) 
t the 
sing ICT in education, ICT and student 


enational research coordinators ( 


learning at lower-secondary level, ICT as part of teachers’ professional development, and the 


existence of |CT-based learning and ad 


The conceptual framework used to guide q 


The assessment framework provided 
inter 
of three parts: 
Th 
in 
student questionnaire designed to meas 
Th 
that was addressed through 
to measure student percepti 


a cognitive 
ons related 
The contextual framework: This mapped the 
explain variation in those ou 
settings. 


The contextual framework iden 
ClLand CT takes place. It assum 


places outside school. 


Students’ development of ClL and CT is influenced by what happens in their schoo 


ministration syste 


a conceptual und 
national instrumentation for ICILS (Fraillon et al. 2019). The assessment framework consisted 


e computational thinking framework: This outlined the measure of computati 
test and 
taCih 


con 


tified the variables that reflect the environmen 
es that young people develop their understandi 
activities and experiences that take place in schools and classrooms, and in thei 


ms in the country. 


uestionnaire development 


erpinning for the development of the 


e computer and information literacy framework: This outlined the measure of computer and 
formation literacy (CIL) that was addressed th 
ure student perceptions regarding CIL. 


rough a cognitive test and those parts of the 


onal thinking (CT) 


parts of the student questionnaire designed 


“A 


text factors expected to inf 
tcomes, and map student usage of ICT in schoo 


uence outcomes, 
and out-of-school 


t in which learning 
ngs of ICT through 
r homes and other 


sandclassrooms 


(the instruction they receive, the availability and use of ICT resources in schools, ICT use for 


teaching and learning), their home environments (socioeconomic background 


, availability and 


40 


ICILS 2018 TECHNICAL REPORT 


use of ICT at home), and their individual characteristics. Factors related to each of these different 
levels shape the way students respond to learning about computers and computing. 


Contextual influences on CIL and CT learning are conceived as either antecedents or processes 
(Figure 4.1). Antecedents refer to the general background that affects how CIL and CT learning 
takes place (e.g., through context factors such as ICT provision and curricular policies that shape 
how learning about ICT is provided). Process-related variables are those factors shaping CIL and 
CT learning more directly (e.g., the extent of opportunities for CIL and CT learning during class, 
teacher attitudes toward ICT for study tasks, and students’ computer use at home). 


Figure 4.1: Contexts for ICILS 2018 CIL/CT learning outcomes 


Antecedents Processes Outcomes 


School/classroom 
Characteristics 
Stated ICT curriculum 
ICT resources 


School/classroom 
ICT use for teaching/ 
learning 

CIL/ICT instruction 


eoeec Sete SeesSseeese- iE) cee secs e se Sesess 


Student 
Learning process 


Student 
Characteristics 


In reference to this conceptual framework structure, variables collected through contextual 
instruments with examples of different types of measures are displayed in Table 4.1 below, where 
columns contain antecedents and processes and rows the four levels. The student questionnaire 
collected data on student experience, use, and perceptions of ICT as well as contextual factors at the 
individual (either school or home) level. The teacher, principal, and ICT coordinator questionnaires 
focused on gathering data to be used at the school level while the national contexts survey and 
published sources provided variables at the system or national level. 


ICILS 2018 QUESTIONNAIRE DEVELOPMENT 


Table 4.1: Mapping of variables to contextual framework with examples 


Level of ... Antecedents Processes 

Wider community NCS & other sources: NCS & other sources: 
Structure of education Role of ICT in curriculum 
Accessibility of |CT 

School/classroom PrQ, ICQ, & TQ: PrQ, ICQ, TQ, & StQ: 
School characteristics CT use in teaching and learning 
ICT resources CIL/CT instruction 

Student StQ: StQ: 
Gender CT activities 
Age Use of ICT 

CIL/CT 

Home environment StQ: StQ: 
Parent socioeconomic status Learning about ICT at home 
Home ICT resources 


Notes: NCS = national contexts survey; PrQ = principal questionnaire; |CQ = ICT coordinator questionnaire; 
TQ = teacher questionnaire; StQ = student questionnaire. 


Development of the ICILS context questionnaires 


The international study center (ISC) at AC 
e ICILS 2018 context questionnaires 
so provided suggestions for addi 


of th 


centers a 


work included e 


and natio 


The developmen 


whic 


three phases: 
e Phase 1: 


h paralleled 


From February 2015 to September 2016 fi 
the content being guided by the assessment framew 
consultat 


2016 as well as via online reviews in June and July 2016). 


© Ph 


M 


Subsequent a 
about the sui 


© Ph 


to 


October 2 


(includingam 
main survey | 
and the trans 


nalyses of field trial data took place 


O17, |SC staff discussed the field 


tems. From November 2017 to Fe 
ations were verified. 


bruary 2018 survey materials were t 


The procedures and criteria used to review the field trial material and results were 


for the student, teacher, and school questionnaires. During instrument development, particu 
tion was paid to the appropriateness of questionnaire material for the large variety o 


atten 


ER coordinated the development and implementation 
for students, teachers, and schools. Several national 
tional questionnaire item material. The development 
xtensive reviews and discussions at different stages of the process with experts 
nal centers. 


t process for the student, teacher, and school questionnaires followed a sequence 
that for the ClL and CT tests. Specifically, the questionnaire development involved 


eld trial material was developed with 
ork. It also included various rounds of 
ions andreviews by NRCs (at meetings in March 2015, February 2016, and September 


ase 2: Preparations for the field trial took place from November 2016 to May 2017. From 
ay to June 2017 the international field trial was conducted in 14 participating ICILS countries. 
from July to August 2017 to inform judgements 


tability of questionnaire material for the main survey. 


ase 3: The final phase focused on the selection of items for the main survey. From September 
trial results with staff in the national centers 


eeting with NRCs in September 2017). Phase 3 concluded with a final selection of 


ranslated 


the same 
ar 
f national 


contexts in participating countries as well as to existing differences between education systems 
and between schools within each participating education system. 


41 


42 


The following criteria inforn 


e Relevance with regard tot 


e Appropriateness for the 


e Psychometric properties 


ned the selection o 


national contexts o 


of items designed 


Because the national contex 


ts survey did not inc 


f item materia 


f the participa 


to measure lat 


ICILS 2018 TECHNICAL REPORT 


for the main survey: 


he ICILS 2018 assessment framework; 


ting countries; and 


ent traits. 


ude afield tria 


(see section below), the procedures 


used to develop it differed fro 


m those used for other context questionnaires. 


Development of the student questionnaire 
Students completed the ICILS student 


ClL test. In countries taking part in 
the CIL test and before the CT tes 


questionnaire on computer after they had completed the 


with ICT. ICILS researchers at ACER coordinated the developmen 
t questionnaire. 


studen 


The ICILS field trial student questionnaire material included a tot 


international co 


the CT option, the student questionnaire was completed after 
t. The student questionnaire collected information about the 
students’ characteristics and their home background as well as the students’ use of and familiarity 


t and implementation of the 


al of 33 questions (30 were 


re questions and three were optional with 171 items (148 were core and 23 were 


optional). To reduce the length of the assessment time per student while trialing a larger pool of 


item material, the field trial student question 
sinaway so that it was possible 


allocated these 


sets an 


rorm 


dscales. The 


from 350 schools. O 


The an 


alyses of the 


and informed the se 


(see ex 
2009). 


on items (using exp 


based on those sets of ite 


Naver. 


he In 


orato 


age | 


held trial data provi 
ection of t 
to investigate the cross-n 
amples fromt 
ISC staff analyzed 
skewness, etc.), the dimen 


ational validi 


ternational Ci 


field-trial da 
sionality and 
ry and confir 


he main survey material. ISC staff part 
ty of the measures derived fr 
vic and Citizenship Education Study [ICCS] 2009 in Schulz 
ta in terms of the distributions of responses (to check for 
structure of the scales representing constructs and based 


matory factor analyses), and 


ms designed 


to measure the correspondin 


naire was administered in three different forms. We 
to analyze all possible combinations of item 
field trial questionnaire was completed by samples consisting of 6695 students 
neach country, 478 students provided data for the field trial analysis. 


ded empirical evidence on the quality of the item material 


icularly emphasized the need 
om the ICILS questionnaires 


the reliabilities of the scales 
g constructs. These analyses 


were conducted for each participating education system in order to check that the measurement of 


instruments was equivalent across national contexts. Staff then disc 
and the proposed draft s 


helped the ISC staff to select item ma 
collection stage of ICILS 2018. 

The fin 

and three were opt 

for countries), to be completed within 
items ( 

(includ 


followi 


ng five sections: 


tudent questi 


terial for main survey instru 


ussed the field trial outcomes 
onnaire for the main survey with NRCs. Their feedback 
ments used in the final data- 


al international student questionnaire consisted of 31 questi 
ional for countries) containing 13 


the targeted 


including one optional item) were designed to capture stud 
ing ICT experience) and 113 were designed 
with ICT (including 11 optional items). The main su 


ons (28 were core questions 


5 items (123 were core and 12 were optional 
time of 20 to 25 minutes. Twenty-two of the 


ent-background information 


to measure students’ use of and familiarity 
rvey student questionnaire consisted of the 


e About you: This section included questions about the student’s age, gender, and expected 
education. 


e Your home and your family: These questions focused on characteristics of the students’ homes 
(including ICT resources) and their parents’ occupations and educational backgrounds. 


ICILS 2018 QUESTIONNAIRE DEVELOPMENT 


e Your use of ICT: These questions as 
s ICT skills, thei 
ICT for different activities. 


taught them variou 


e Using ICT for school: 
for school-related p 


tools, and the extent to which they 


e Your thoughts about 
students’ beliefs ab 
ICT in society, and t 


The student questionnaire ite 
delivery system was configured to support th 


could, for example, prevent students 


contradictory answers 


the basis of a given response. F 


students were first as 
to more specific quest 


urposes in gene 


ms were design 


urthermore, wh 


These questions asked students to indicate their frequency of use of ICT 
ral and in specified learning areas, their use of specified ICT 
had learned at school how to do specified ICT tasks. 


ed students to report on their experience with ICT, who 
r use at different locations, and their frequency of use of 


sing and learning about ICT: These questions were designed to measure 
out their ability to do ICT tasks (ICT self-efficacy), their attitudes toward 
he extent to which they had learned about CT-related tasks. 


ed to be administered on computer. The computer 
e quality of data provided by students. The system 


from giving invalid responses (such as selecting multiple 
where only one was required) and direct students to targeted questions on 


en answering the questions on parental occupation, 


ked if their parents were currently in a paid job and were then directed 


ions about each paren 


previous occupation (if not currently in a paid job). 


Development of the teacher questionnaire 


t’s current occupation (if currently in a paid job) or 


The teacher questionnaire was designed to collect contextual information about school and 
classroom contexts for ICT learning, use of ICT for teaching and learning, teacher views on the 


pedagogical use of ICT, and their confidence in using computers. ISC staff at AC 


ER coordinated 


development and implementation of the questionnaire and asked external experts and experts from 
national centers to review it at different stages of the study. The questionnaire was administered 


primarily through an o 


Under the assumption 
to which ICT is used wi 
questionnaire directed 
8). They also designed 


The questionnaire incl 
learning. Those teache 
class” about the extent 
class was specified for 


that teaching st 


the questionnai 


uded a questio 
rs who said yes 


the questionnal 


of ICT use in thi 


ire online. 


nline system but with provision for paper-based delivery in case teachers 
were unable or unwilling to complete the questionna 


aff constitute an important factor in determining the extent 


thin the school context, the ISC staff responsible for designing the teacher 


reatallteachers teaching at the target grade (typically grade 


after the last Tuesday before answering the questionnaire. 


The field trial teacher questionnaire consisted of 20 questions (including two opti 


reso that teachers could complete it in about 30 minutes. 


n about whether teachers used ICT in their teaching and 
were asked to provide information relating to a “reference 
s class for different purposes and activities. The reference 
teachers as the first regular class at the target grade they had taught on or 


onal questions) 


with a total of 159 items (including 22 optional items). It was administered to teachers from all 


subjects teaching at the target grade in the schools selected for the field trial. The fi 
sample consisted of 4236 
the field trial teacher samp 


analyses from the field 


material for the main survey. 


The final main survey teach 


es consisted of about 300 teachers in each participati 


eld trial teacher 


teachers from 365 schools in 14 participating countries. On average, 


ng country. The 


trial teacher questionnaire data provided a basis for the selection of item 


er questionnaire consisted of 18 questions with 116 items (including 


10 items that were optional to countries) and was divided into the following five sections: 


e About you: These questions concerned teachers’ background characteristics and the subjects 


that they taught. 


43 


44 


ICILS 2018 TECHNICAL REPORT 


e Your use of ICT: These questions focused on teachers’ experience of ICT, their frequency of use 

of ICT, and their confidence in performing ICT tasks. 
e Your use of ICT in teaching: These questions asked teachers to name a reference class, provide 
information about the subject taught in that class, and state whether they used ICT for teaching 
and learning activities in this class. Those teachers who said they used ICT were asked to 
indicate the emphasis given to the development of Cll and CT-related capabilities and their 
use of ICT for various class activities and teaching practices. They were also asked to indicate 
the frequency with which they used various ICT tools in their teaching of this class. 


e In your school: The questions in this section asked the teachers about their views on using ICT 
in teaching and learning. The questions also asked teachers about provision for and practices 
concerning the use of ICT in their school. 

e Learning to use ICT in teaching: This section asked teachers about whether their initial teacher 
education included learning to use ICT and whether they had participated in |CT-related 
professional learning. 

e Approaches to teaching: This asked teachers to indicate the extent to which they agreed with a 

series of statements about using ICT in teaching and learning at school. 


Development of the school principal and ICT coordinator 
questionnaires 

The school questionnaires were designed to collect information about the school context in 
general and the use of ICT in teaching and learning in particular. Two questionnaires were used to 
collect this information. The first was directed to the school principal and the other to the school 
ICT coordinator. Both questionnaires were delivered online by default, but an alternative paper- 
based version was available in cases where respondents were unable or unwilling to complete it 
ona computer. 


Factors relating to the school context included school characteristics, such as school size, 
management, and resources, the availability of |CT resources, professional development regarding 
ICT use for teachers, and expectations for ICT use and learning. 


The questionnaire for school principals was designed to be completed in 15 minutes. The questions 
addressed school characteristics as well as school principals’ perceptions of ICT use for teaching 
and learning at their schools. 


The ICT coordinator questionnaire was also designed to be answered in 10 minutes. It included 
predominantly objective questions about the respective schools’ ICT resources and their processes 
and policies with regard to this area. 


The school principal questionnaire for the field trial included 18 questions with a total of 92 items, 
and was administered to 340 principals from the 14 countries that participated in the field trial. In 
most countries, about 24 school principals provided responses to the field trial questionnaire. The 
ICT coordinator questionnaire for the field trial consisted of 17 questions with a total of 91 items 
and was completed by 327 ICT coordinators at participating schools in 14 countries. 


The analyses of field trial data focused on providing empirical evidence that would assist selection 
of the main survey material. However, the relatively small number of responses in each of the 
participating countries (the maximum was only one per school) meant that analyses of the field 
trial data gathered by the two questionnaires were limited in scope. 


The ISC research team discussed the results of the school questionnaire field trial with NRCs 
before selecting the items that would be included in the final main survey instrument. Revisions 
made after the field trial included a rewording of some of the items. 


ICILS 2018 QUESTIONNAIRE DEVELOPMENT 45 


The review of the field trial outcomes led to a reduction in the size of the school principal 


questionnaire, which consisted of 15 questions with a total of 94 items spread across the following 
four sections: 


e About you and your use of ICT: This section asked school principals about their gender and ICT 
use. 


e Your school: This section contained questions about school size, grades taught at the school, 
community size, and school management. 


e /CTand teaching in your school: This section consisted of questions about the importance assigned 
to ICT use at school, monitoring of ICT use by teachers, and expectations about teacher use of 


e Management of ICT in your school: This section contained questions about ICT management, 
|CT-related procedures, |CT-related professional development for teachers, and priorities 
for ICT use in teaching and learning. 


The final ICT coordinator questionnaire comprised 15 questions (including two optional questions) 
with a total of 87 items (including 11 optional items). It contained the following three sections: 


e About your position: This section asked ICT coordinators about their position at school and their 
school’s experience with computers for teaching and learning. 


e Resources for ICT: This second section included questions on the ICT equipment available at 
school. 


e /CT support: This section consisted of questions on the support provided for ICT use at school 
and/or the extent to which a lack of resources was hindering that use. 


Development and implementation of the national contexts survey 


The ways in which students develop ClL and CT are potentially influenced by factors located at the 
country or national context level. These variables include, among others, the education system in 
general as well as policies on, andthe curricular background of, CILand CT education. The national 
contexts survey was designed to collect relevant data and information about both antecedents and 
processes at the country level. The experience of studies such as the Second Information Technology 
in Education Study (SITES) 2006 (Plomp et al. 2009), the US Department of Education (2011) 
study of educational technology, and ICILS 2018 (Fraillon et al. 2014) informed the development 
of the national contexts survey. 


ICILS staff at the ISC at ACER organized the development and coordination of the national 
contexts survey as well as the analyses, verification, and reporting of the data collected by this 
instrument. Throughout this work, the ISC staff worked closely with national center staff from 
the participating countries. 


The development and implementation work consisted of three phases: 


e Phase 1: During this first phase, which spanned May to August 2017, the ISC team, in discussion 
with the national centers, reached agreement on the nature and scope of the survey’s contexts 
and questions. During this phase, international project team members and national center staff 
discussed the various draft versions of the survey and reached agreement on a final version. 


e Phase 2: Between March 2018 and January 2019, the NRCs answered the national contexts 
survey. 


e Phase 3: The final phase took place between October 2018 and July 2019. During this phase, 
ISC staff reviewed the collected information and, where necessary, verified the outcomes with 
national centers. 


46 


ICILS 2018 TECHNICAL REPORT 


During the development phase of the national contexts survey, the research team applied the 
following criteria when considering which contexts and questions to include in it: 


Relevance of content with regard to the ICILS 2018 assessment framework; 


Relevance and additiona 
of ClL and CT education; 


Appropriateness of the in 


value of gathering information abou 


strument for the national contexts of 


t the wider community context 


the participating countries; and 


Validity of the measured variables in terms of comparability, analysis, and reporting. 


The following issues were considered for modification, refineme 


the ICILS 2013 national contexts survey: 


nt, and further development of 


The increasing role of tablet-based education tools in education; 


Changes in policies related to ICT in education since 2013; 


Issues related to specific 


regions or groups of countries; 


Alternative response formats to improve the measurement of aspects of educational policies 


and curricula; 


Ways of increasing the objectivity of information provided by asking a greater proportion of 
factual questions and involving national experts to a greater extent; and 


Outcomes of a review reflecting the extent to which national c 


2013 had proved useful f 


or reporting. 


ontexts survey data from ICILS 


The final version of the national contexts survey was placed, along with accompanying notes for 
guidance, online via servers at IEA. National centers were requested to draw on expertise in the 


fleld of |CT-related educatio 


The su 


nin their countries when answering the survey. 


rvey consisted of 25 questions including 163 items (some items were fixed responses and 


others asked for text response) plus 26 requests for an elaboration or comment. The questions 


asked respondents about key a 
their country. The q 


Education system; 


CT and student learning 


The online facility enabled n 
sessions (i.e., they could log on and off in order to complete the ques 
became available). 


The ISC usedt 


maximum of a 
centers were 


CT and teacher development; and 


Plans and policies for using ICT in education; 


at lower-secondary level; 


CT-based learning and administrative management systems. 


ntecedents and processes in relation to CIL and CT education in 
uestions were grouped into five sections: 


ational center staff to complete the survey in several administration 


he outcomes of the national contexts survey in conjunction with data from published 
sources to inform the descriptions of the education systems participating in ICILS. To ensure a 
ccuracy regarding the information reported in the international report, nationa 
invited to conduct detailed reviews of all reported outcomes that were based on 


the national contexts survey data collection. 


tionnaire as needed information 


ICILS 2018 QUESTIONNAIRE DEVELOPMENT 


References 

Fraillon, J., Ainley, J., Schulz, W., Duckworth, D., & Friedman, T. (2019). IEA International Computer and 
Information Literacy Study 2018 assessment framework. Cham, Switzerland: Springer. https://www.springer. 
com/gp/book/9783030193881 
Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Gebhardt, E. (2014). Preparing for life in a digital age: The 
IEA International Computer and Information Literacy Study international report. Cham, Switzerland: Springer. 
https://www.springer.com/gp/book/9783319142210. 

Plomp, T., Anderson, R.E., Law, N., & Quale, A. (Eds.). (2009). Cross national policies and practices on information 
and communication technology in education (2nd ed.). Greenwich, CT: Information Age Publishing. 

Schulz, W. (2009). Questionnaire construct validation in the International Civic and Citizenship Education 
Study. IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, vol. 2, 113-135. 

US Department of Education, Office of Educational Technology. (2011). International experiences with 
educational technology: Final report. Washington, DC: Author. 


CHAPTER 5: 


Instrument preparation and verification of 
the ICILS 2018 study instruments 


Sandra Dohr, Tim Friedman, David Ebbs, and Lauren Musu 


Introduction 


version of the 


Translation, ad 
preparation p 
comparability. 


required to translate the international version of 


to their nation 


The ultimate ai 


that comprise 
ayout verifica 
processes and 


verification of 
adaptations, af 


and was responsible for prepari 
was used for online administrat 
translation verification process i 
provider for translation and translation verifica 


Figure 5.1sho 


ready for adm 


the national study centers of the ICILS 


Procedures Unit 3. 


instruments to the internati 
but at the same time appropriateness 
order to achieve the overarc 
CILS study instruments un 


The ICILS 2018 international study center (ISC) at ACER developed the 


t and con 
participa 
hese IC 
in ensu 
and be 


ICILS student assessmen text ques 
ting coun 
aptation, and verification of t 
rocess and play a key role 


All participating countries 


ring high 


al context following detailed guidelines 


mof the instrument preparati 
onal version as well as eq 
for each country’ 
hing goal of cross-nationa 
derwent a thorough and high 
d several verification s 
tion. The ISC at AC 


were responsible for different stages of the w 


for reviewing and approving all national adaptations of the s 
ments and paper-based question 
ipal, teacher, and ICT coordi 


the student instru 
ffecting the struc 


ture of the princ 
ng these questionnaires in 
ion of the questi 


ws the overall workflow for prepa 


inistration. 


LSstudyi 


nchmarki 
the ICILS study instruments and ada 


Englis 
tionnaires in close coll 
tries and benchmarkin 
nstruments are part of 
-quality data and inte 


h inter 
aborati 


g parti 
the ins 


ng participants in ICI 


isted 


tages: adaptation veri 


onnaires. IEA also ma 
ncollaboration with cApStAn Linguisti 
tion services based in 


ring the national ICILS 
with the release of the international version of the ICILS study instruments, the adaptation an 
translation process, the three verification stages, and ending with the fi 


y standardized 


en 
EA 


tudy instrum 
naires. | 


the IEA Onli 


Brussels, Be 


study instru 


naged and coordi 
c Quality Contro 


LS 201 


in the ICILS 2018 Survey Operati 


on process is to guarantee comparability of the natio 
uivalence across countries and languages 
s national context and education system. 
comparability and data validity, all national 
preparation process 
fication, translation verification, and 
ER and |EA oversaw the instrument preparation and verificati 
hole process. The |ISC was responsi 
ts. as wellas 
implemente 
nator ques 
ne Survey System, whi 
nated 


,a 
Igium. 


ments 


nation 


rnational dat 
18 were 
m 


on wit 
cipant 
rume 


pt the 
O 


O 
bl 


C 


leadi 


startin 


nal set of study instrument 


the 
ng 


n 


nal 


n 


n 
e 


the layout 
dnatio 
tionnaires 


nal 


h 


Wn 


ICILS 2018 TECHNICAL REPORT 


Figure 5.1: ICILS 2018 instrument preparation workflow 


ISC releases international NRCs adapt instruments to 


version of instruments national context > Sela el Tala iiic chen Sic) 


' 


Implementation of structural 


es , NRCs translate study 
—> adaptations in online -—> 


NRCs implement feedback from 


adaptation verification ; instruments 
environment (RM & IEA) 
Translation verification NRCs implement feedback from ; F 
> > Layout verification (ISC) 
(IEA) translation verification ue 


' 


NRCs implement feedback from ISC releases final national NRCs' final check of 
layout verification versions of ICILS instruments instruments 


Notes: NRCs = national research coordinators; RM = RM Results (developed the software systems underpinning the 
computer-based student assessment instruments for the main survey). 


Adaptation and translation of the ICILS study instruments 


The adaptation and translation process 


As part of the i 
required to tra 
included the ass 
needs and cultu 
Procedures Uni 


Documenting a 
In ICILS 2018, 


the instruments 


nslate the international source version of 
essment items as well as the questionnaires, 
ral context. The consortium provided nation 
t 3, which described the procedures and guid 


nd implementing national adaptations 


there were two different types of adaptati 


nstrument preparation process, national research coordinators (NRCs) were 


the ICILS study instruments, which 
and adapt them to their specific study 
al centers with the Survey Operation 
ed NRCs through all necessary steps. 


ons that participants could apply to 


, namely structural and non-structural adaptations. Structural adaptations refer 


to alterations that change the structure of the study instru 
questions, adding, splitting, or omitting answering options or 


centers were all 
not to the asses 


ocal context. N 


does not alter the structure of 


owed to apply this type of adaptation solely t 
sment instruments. 


RCs could add non-structural adaptations 


ments. This includes adding national 
not administering questions. National 
othe background questionnaires and 


on-structural adaptations (also referred to as localizations) are defined as any adaptation which 
the study instruments, but may change the meaning to fit the 


to their instruments, wherever they 


considered these necessary. Certain terminology cannot be translated directly in every target 
anguage and therefore needed to be replaced by terms or 
and were appropriate for the target audience. Measurement and punctuation conventions are 
typical examples of such adaptations. 


phrases that fit the national context 


INSTRUMENT PREPARATION AND VERIFICATION 51 


While NRCs could decide where they consider adaptations as crucial, in some cases they were 
required to adapt certain terminology. The international version of the study instruments indicated 
where such required adaptations were necessary. For example, required adaptations were the 
[target grade], personal names (e.g., [Male name 1]), places (e.g., [M-town]), or International 
Standard Classification of Education (ISCED) level equivalents. Fictional technical and software- 
related terminology also required adaptation in some cases. For instance, the term [WebSearch] 
referred to an online search engine and needed to be adapted to a term that makes sense to 
students. 


Inthe field trial (where adaptations were requested directly into the translation system), structural 
and non-structural adaptation took place as two separate stages. However for the main survey, 
RCs had to record all structural and non-structural adaptations for their study instruments at 
the same time ina document called the national adaptation form (NAF). During the adaptation and 
translation process, NRCs had to document the national version of the adapted text in the NAF 
and include an English back translation and an explanation or comment if necessary. 


While national adaptations are important to reflect the country context, maintaining the 
comparability to the international source version is highly important. As such, the meaning and 
difficulty level must be preserved. Therefore, cognitive items may not be simplified or clarified ina 
way that would help the students identify the correct answer. In order to help countries adapting 
their materials to the national context, the ISC provided them with detailed notes that included 
definitions about specialized terminology in the ICT context as well as general guidelines for 
translation and adaptation. 


Translating the ICILS 2018 study instruments 

In order to create a high-quality translation, it is important to use professional and experienced 
translators and reviewers. For ICILS, IEA recommended using at least one translator and one 
translation reviewer for each target language, or more if only a limited amount of time was 
available. Ideally, the selected individuals should fulfill the following criteria: 


e Translators and reviewers should have an excellent knowledge of English. 


e Translators and reviewers should be native speakers of the target language. 


e Translators should be familiar with survey instruments. 


e Translators should have experience in translating electronic texts, such as websites. 


The ideal workflow suggested that the selected translator translates all materials initially. In the 
next step, the reviewer reviews the translations, makes comments on the appropriateness of the 
translations and proposes improvements. The NRC finalizes the materials before submitting them 
for translation verification. In countries with more than one administered language, the same logic 
applied to each language. In addition, countries using more than one administered language needed 
to perform equivalence checks between the two or more language versions. 


In countries where English was the administered language, the process of instrument preparation 
was identical to all other languages, except that the source version of the instruments did not 
require translation, but adaptation to accommodate the national English usage in their country. 


Guidelines for translation and adaptation 


Due to the complexity of languages and national contexts, it is difficult to provide participating 
countries and benchmarking participants with explicit guidelines for translation and adaptation. 
However, IEA provided NRCs with general guidelines and recommendations (included in the 
Survey Operation Procedures Unit 3), which are helpful to produce study instruments that are 
comparable to the international source. 


52 


ICILS 2018 TECHNICAL REPORT 


During the translation process, translators, reviewers, and NRCs should pay particular attention 


to the following: 
Cult 
Terms and phrases in th 


version. 


he essential meaning of 


het ated text shou 


ext. 


S 


t 
het ated text shou 


ated text shou 
rmation. 


The translated text shou 


nguage. 


The translated text sho 


ie 


target audience. 


Target languages 


All participating countries and benchmarking participants in ICILS 2018 had to translate 
anguages in their country. The selection of target languages 
f a country that are used in public areas of society or are t 
ority languages are taken into account in so 
The decision on whether a language was administered or not mainly depended on 
hools were selected. In these cases, participa 
| required languages. Most of the 14 participa 


student instruments into the relevant 
usually includes all official languages o 


dominant languages of instru 


frame and whether or not minority sc 
had to prepare their study instrume 


ction. Min 


nts ina 


uld have the equivalent social, poli 
appropriate for the target language and used at this level of education. 


iomatic expressions should be translated appropriately and 


Translators should avoid vocabulary, expressions, and concep 


Spelling, punctuation, and capitalization in the target text shou 
anguage and the country’s national context. 


ural and contextual differences should be identified and minimized. 


e target language must be equivalent to those in the international 


the text and the reading level should be maintained. 


dreflect the same language level and degree of formality as the source 


d have correct grammar and usage. 


d not clarify or omit text from the source version, and should not add 


d have equivalent qualifiers and modifiers appropriate for the target 


tical, or historical terminology 


not literally. 


ts that may be unfamiliar to the 


ld be appropriate for the target 


ational adaptations should be documented and implemented appropriately. 


Changes in layout due to translations should be reduced to a minimum. 


the 


me countri 
the sampli 
ting countries 
nts 


administered only one language. However, two countries administered ICILS in two languages, 


namely Kazakhstan (Russian and Kazak 
only country administering three langu 


language versions were admi 
in ICILS 2018. 


h) and 


nistered i 


ages ( 
n ICILS 2018. Table 5.1 shows all administered languages 


Englis 


Finland (Finnish and Swedish). Luxem 


bourg was the 
h, French, and German). In total, 17 different 


ICILS instruments requiring translation and adaptation 


The ICILS 2018 study instrum 
questionnaires. The student 
related to computer and infor 


ents inc 
assessment con 


(CT). Three out of the f 
ICILS 2013. For coun 


five CIL modu 


tries that parti 
modules were used to compu 


te trend 


es were trend 
cipated in bo 
measuremen 


avalid trend measurem 


ent, the trend 


translat 
The administration of the two CT modules (n 


ew to 


uded two main components: student assessment and context 
nprised seven different modules in which five were 
mation literacy (CIL) and two were related to computational thinking 


modules and therefore also administered in 
th ICILS 2013 and ICILS 2018, these trend 
t onthe achievement scale. In order to ensure 


ions needed to remain identical to the previous cycle. 


CILS in 2018) was optional. 


INSTRUMENT PREPARATION AND VERIFICATION 


Table 5.1: Languages used for the ICILS 2018 study instruments 


Participating country 


Administered language 


Chile Spanish 
Denmark Danish 
Finland Finnish 
Swedish 
France French 
Germany* German 
taly talian 
azakhstan azakh 
Russian 
orea, Republic of orean 
Luxembourg English 
French 
German 
Portugal Portuguese 
United States English 
Uruguay Spanish 
Benchmarking participants 
oscow (Russian Federation) Russian 
orth Rhine-Westphalia (Germany)* German 


Notes: *The same translations were used in Germany and North Rhine-Westphalia 


In addition to the student assessment modules, NRCs had to prepare the following context 


questionnaires for data collection: 
e Student questionnaire 
e Teacher questionnaire 
e Principal questionnaire 


e¢ ICT coordinator questionnaire 


National study centers also had to translate and adapt the school coordinator and test administrator 
manuals as well as the scoring guides for constructed-response items. However, the ISC and IEA 
did not monitor the translation and adaptation process for these documents and did not perform 


any of the verification processes. 


Translation systems 


= 


Inorder to prepare the ICILS 2018 study instrum 
national study centers with two online translati 
AssessmentMaster (AM) system from RM Resu 


ents for the main survey, t 


about the two systems). The online translation systems served as the main 
parties to produce high-quality national instruments and to communicate with one another. 


The instrument preparation process of the ICT coordinator, princi 


took place in the IEA Translation System, wh 
assessment and the student questionnaire. Bo 
translations. The systems also kept a full record 


of the editing history for 


he ISC and IEA provided 


on systems: the IEA Translation System and the 
ts, AM Designer (see Chapter 3 for more details 


platform for all involved 


pal, and teacher questionnaire 
ereas AM Designer was used for the student 
th systems allowed users 


to apply and edit their 
the text and enabled all 


users to retrace changes made by other users ensuring transparency of the translation process. 
In addition, the systems displayed the differences between a previously saved translation and 
the current translation in the form of track changes. This function supported NRCs with their 


reviewing process. 


53 


54 


ICILS 2018 TECHNICAL REPORT 


International verification processes 


After the international instruments were adapted to the national context, translated, and internally 
reviewed by the national centers, the national versions of the instruments were submitted for 
external verification, which consisted of a rigorous three-part verification process: (1) adaptation 


verification, (2) translation verification, and (3) layout verification. 


Adaptation verification 


better alignments to the international source version. 


RCs were asked to consult with ISC staff for a review of all proposed national adaptations. The ISC 
particularly emphasized the need for NRCs to discuss any adaptation that might result ina serious 
deviation from the international instruments. National centers began completing the NAF (Version 
) after reviewing the international version of the survey instruments. They then submitted the NAF 
for consultation with the ISC, where two staff members reviewed the adaptations independently. 
Once completed, these reviews were consolidated into one document, and were then provided 
to the national centers with feedback on their adaptations and, where necessary, suggestions for 


Common issues identified during review of adaptations included the following: 


e Inconsistent use of adaptations within or across modules and questionnaires; 


e Fictional names for software or technologies not considered to be equiva 
version; and 


lent to the source 


e Difficulties in establishing country-appropriate adaptations for ISCED levels. 


The ISC asked the national centers to take the recommendations into account and update the 
forms accordingly. The NAF was only finalized once the ISC and national center were in agreement 
about all adaptations. Once this was done, these updated forms (Version II) would then inform the 


translation verification process. 


Translation verification 


Translation verification is the second out of three verification steps conducted during instrument 


preparation. The goal of this process is to ensure high-quality translations of 
2018 instruments and their equivalence to the international source version. 


the national |ICILS 


IEA managed and 


coordinated this process in cooperation with cApStAn Linguistic Quality Control. 


International translation verifiers and their responsibilities 


The contracted, professional, international translation verifiers were native speakers of the 
administered language. Further, the verifiers were certified translators working in English, witha 
university degree and ideally living in the target country or at least having experience working in 
the country context. Verifiers needed to attend a training seminar, where they received information 
regarding the ICILS study design and online environment. They also received instructions regarding 


the verification specifications and requirements. 


The translation verifiers’ main responsibilities included reviewing the translati 


ons for the target 


country and evaluating the accuracy and comparability of the national version of the ICILS 


instruments to the international version. Their tasks further comprised docume 


nting all deviations 


inthe country’s translations and adaptations and suggesting alternatives to improve the quality of 


the translations. Verifiers checked if the meaning and reading level of the text 


had been affected 


and if the test items had been made easier or more difficult. They also ensured that no information 


was added or omitted. Translation verifiers made sure that the instruments co 


ntained all correct 


items and response options in the right order and that adaptations had been recorded and 


implemented correctly. 


INSTRUMENT PREPARATION AND VERIFICATION 


of trend measurement on the ICILS achievement scal 


stot 


ranslation the country submitted for translation verific 
ag any detected differences. 


+ 


In addition to the general verification of the translations, international verifiers had to perform 
a trend check for countries that participated in ICILS 2013. To ensure the quality and accuracy 
n 


the exact wording of the translation that countries used in the previous ICILS cycle. This was 
applicable to four countries that participated in ICILS 2018. IEA provided the verifiers with files 
hat contained the final translations from ICILS 2013. The verifiers’ task was to compare th 


e, it was of high importance to maintai 


e 
ation with the version of the last cycle and 


Translation verification procedure and verifiers’ feedback 


The translation verification process occurred in the tw 


o previously described online translation 


systems. International verifiers could apply corrections or suggestions directly to the text elements 


inthe online environment. Any changes made by the veri 
text element. In addition, verifiers were required to doc 


fler were displayed as track changes inthe 
ument their corrections and suggestions, 


describe the issue, and provide an English back translation in the form of comments for every 


affected text element and in the NAF. To simplify the 


verification process and to enhance the 


comprehensibility of the verifiers’ feedback, verifiers assigned so-called severity codes to every 
change or suggestion. They made use of the following codes to indicate the severity of the issues 


they found: 


e Code 1: Major change or error. The translation containe 


30 


eaning of a question, omitted or added information 


ifferences for trend translations. 


ode 1?: Used by verifiers whenever they were in dou 
f how to correct a possible error. 


e 
oO a0 


e Code 2: Minor change or error. The translation contai 
comprehension. Examples included spelling errors, g 
did not affect the comprehensibility of the question. 


e Code 3: Suggestion for alternatives. Used by verifiers 
otherwise appropriate translation. 


dasevere error that affected the meaning 


r difficulty of the item. Examples included mistranslations, translations that change the 


, incorrect order of questions or response 


ptions, and incorrectly implemented national adaptations. Verifiers also used this code to flag 


bt about the severity of an issue or unsure 


ned a minor error that did not affect the 
rammatical errors, and syntax errors that 


to suggest an alternative wording for an 


e Code 4: Acceptable change. The translation was deemed acceptable and appropriate. Also used 
by verifiers to indicate that a national adaptation had been documented and implemented 


correctly. 


After translation verification, IEA reviewed the verifiers’ feedback and returned the materials 
to the national centers. NRCs were responsible for finalizing their instruments and thus had to 


review the feedback carefully and consider the correcti 


ons and suggestions made by the verifier. 


The procedure required NRCs to react to every comment the verifier made in the IEA Translation 
System, AM Designer, and the NAF. In principle, NRCs could accept, modify, or reject the verifier’s 
suggestion. For the latter two options, NRCs had to give an explanation for why they changed or 


disagreed with the suggestion. 


Some of the typical errors found by the translation verifiers during the verification work included 
mistranslations, literal translations, inconsistencies of terms or phrases, omissions or additions of 
text, undocumented adaptations, grammar and punctuation issues, and spelling mistakes. Some of 
the domain-specific concepts such as |CT-related technical terminology were a particular challenge 
to translate into some languages. According to the survey activities questionnaire completed 
by NRCs and used to collect feedback on survey operations, almost all participants found the 


translation verification feedback very useful (8) or at least somewhat useful (3) 


[he majority of 


the national centers (10) reported that they did not experience any major problems during the 


Sb) 


translation verification 
and in improving the q 
ICILS 2018. 


As an additional qua 


ICILS 2018 TECHNICAL REPORT 


process. The constructive feedback aided NRCs in revising the materials 
uality of their national versions in line with the translation guidelines for 


ity control step, international quality control observers reviewed the 


implementation of the translation verification feedback after the data collection period and 


documented irregulari 


Layout verification of s 


Once translation veri 


were ready for layout 
materials to “Ready fo 


verificatio 
r Layout Verification.’ The 


tudent instruments 


n. In AM 


Design 


cation had been completed, the ISC asked nationa 
changes resulting from translation verification, and notify the ISC that th 
er they were asked to change the status of their 
SC accessed these materials and documented all 


ties (see Chapter 9 for quality assurance procedures). 


centers to make any 
eir student materials 


issues inaworksheet added to the NAF. The layout issues in each set of instruments were grouped 


as to whether th 


related 


The most comm 
r additional 
icU 


Atting within 


° iSSi 


ngo 


Unrealist RLs ore 


Text not 


ssues Wi 


Extra spaces 
ths 


ssues Wi tructura 


ical issues 
tional cen 


ter or in 


iona 
were co 
appropri 
more su 


ce 
nsi 
ate changes to 
bs 


nters were p 


ere were genera 
to a specific question or specific group of 


on layout issues t 


relating to the translat 


dered minor, national cen 


tantial layout 


hat were ident 
line breaks 
mail addresses 


the pre-defined spaces 


th HTML tags 


| changes to na 


ion softwa 
some cases re 


rovided with a 


concerning al 
on. 


feedbac 
verificati 


ferred to the softwar: 


summary of all layou 
ters were given feedback and were as 
their materials without the need fo 
issues were identified, national centers were provid 
issues and were asked to resubm 


layout issues relating to the set of instrum 
questions within an instru 


ified were: 


re were of 
e developers. 


ten fixed by the veri 


r further verificatio 


ents, or whether they 
ment. 


tional content in the questionnaires 


fiers on behalf of 


t issues. In cases where layout issues 


ked to make the 
n. In cases where 
ed with detailed 


it their materials for further layout 


Layout verification of teacher, principal, and ICT coordinator questionnaires 


All countri 
coordinator question 
countries were asked t 
translations from the | 
or unresolved transla 
cla 


O 


So 
were provided with pap 
Prior to sending the pa 
ad 


me countries also opted to impleme 


ministration modes were comparab 
questionnaires. Apart from a few inevi 
layout between paper and online instrum 


es and benchmarking participants admi 
naires online. Once the conte 
EA who prepared each 
EA Translation System into the | 
tion issues following translatio 
rification. Otherwise, all layout issues identified were fixed by 
with the questionnaires for review and 


notify | 


er questionnaires which | 
per questionn 
e, | 


table excep 


could subm 


nt some que 


aires to the 
EA condu 


uestion 


n 
q 
N 


verification, | 


it any reques 


stionnaires u 
EA extracted from the | 
countries and to ensu 
cted 


istered the teacher, pri 
t for these questionnaires was finalized, 
naire accordingly by i 
A Online Survey System. 
EA asked the coun 
EA. Countries were t 
t for changes to | 


sing paper-based for 


pal, and ICT 


Nc 


mporting the 
f unclear 
tries for 
hen provided 
EA to implement. 


m. Countries 
EA Translation System. 
re that data from both 


N Case O 


asystematic check of the paper and online 
tions any deviations with regard to content and 
ents were reported back to the countries. 


INSTRUMENT PRE 


PARATION AND VERIFICATION 


Before the national questionnaires were published online, the layout and structure of all online 


questionnaires were checked thoroughly by | 


EA. Visual checks were run using the same standards 


and procedures as for the verification of the paper layout. In addition, the structure of the national 


online instrume 


nts was checked against the s 


tructure of the international online instruments (e.g., 


number of categories or width of non-categorical questions). Only intended deviations which were 


documented in 


Once all identifi 
this stage the co 
safeguard withi 
online question 


th 


n 


e NAF were approved. 


U 
the | 


ed inconsistencies had been fixed, the questionnaires were published online. At 
ntry-specific URLs were activated and set to “review mode, which is a built-in 
EA Online Survey System that prevents actual respondents from logging into 
naires before finalization. Without this safeguard, users would be able to access 


the questionnaires and view the welcome page. Prior to publishing the online questionnaires, i.e., 


starting to collec 


review steps: 


(1) 


implementation. 


(2) 


As alast check, the NRCs were asked to carry ou 


t actual respondent data, the questionnaires underwent the subsequent two 


The ISC conducted a layout check. All inconsistencies were documented and sent to IEA for 


t a final review of the questionnaires in the 


online environment. In afew cases, this resulted in additional minor changes (e.g., correction 
of spelling errors). 


Only after the finalization of the online questionnaires were respondents notified and provided 
the link and login information to access the questionnaires. 


Summary 


To ensure high-quality translations and comparabi 


national centers 
procedures for a 


to the international instruments, th 
daptation, translation, and internati 


ity of the natio 
e 2018 cycle of 


NRCs were provided with a comprehensive set of guidelines and 


production of na 
with the adaptati 
identified langua 
national instrum 
translation verifl 


ionalinstrum 
onand translation of the internation 


ents. Ineach country, the 


preparation o 
al version of the 


ge(s) of administration. Upon comp 
ents first und 
cation, which 


together, theses 


tages of natio 


eti 


ew of the tra 


onal verification, 


onof the translati 
erwent international adaptation verification and 
involved a thorough revi 
linguistic experts. After these two steps, the layout of all national materials was veri 
nal instrument production ensured that the adaptations, translations, 
and layout of all national study instruments underwen 


similar to 
manuals that fac 
f national in 
ICIL 
dadap 
then in 
ted materials 


ons an 


nsla 


t thorough verification. 


als 


fied 


nal instruments prepared by 
CILS incorporated stringent 
i the previou 
ilita 
struments 
S$ 2018 materi 


scycle. 
ted the 
started 


into the 
tations, the 
ternational 
by external 


. Taken 


CHAPTER 6: 


Sampling design and implementation 


Sabine Tieck 


Introduction 


This chapter provides an introduction to the sampling design and the implementati 
2018 student, teacher, and school survey which are consistent with those used 
(see Meinck 2015). International comparative surveys require samples to be drawn 
m the observed (sampled) data on the population 

n of study participants is a key criterion to ensur 
tween subnational entities. |CILS determined a 


guarantee valid infere 


study. Randomness in the selecti 
parability or comparability 


com 
survey design that co 


nces fro 


O 
be 
nsidered al 


O 


nofthelCILS 
in ICILS 2013 
randomly to 


features under 
e 
n international 


international 


the Technical 


requirements for sampling q 


Standards for IEA Studies (Martin et al. 1999). 


The ICILS 2018 sample design 
sampling, stratification, and cluster sampling. This chapter descri 
details on target population definitions, design features, sam 
efficiency. It focuses on presenting the internationa 
characteristics of each national sampling plan are given in Appen 


The samples of schools, students, and 
were selected or verified by IEA in collaboration wi 
of each participating country or educational system 
Within-School Sampling Software (IEA WinW3S) s 


.Aseries of 


isreferred to as a “complex” design because it 


standard sampling desi 


teachers within each ICILS country and 
th the national research co 


uality specified in 


bes these features and 


ple sizes, a 


dix B. 


the | 


manuals and 


nvolves multi-stage 


provides 


nd achieved design 
gn whereas specific 


education system 
ordinators (NRCs) 
EA Windows 
upported NRCs in their sampling activities. 


The sampling referee Marc Joncas gave advice on sampling methodology, as well as reviewed and 


adjudicated all national samples for this study. 


Target population definitions 


When cond 


ucting a cross-country comparative survey, it is important to clearly define the target 


population(s) under study. ICILS collected information from students, their teachers, and their 


schools, wh 


ich required clear definitions for all three populations. The definitions enabled ICILS 


NRCs tocorrectly identify and list the targeted schools, students, and teachers from which samples 


were to be selected. 


Definition: Students 
ICILS defined the target population of students as follows: 


The student target population in ICILS consists of all students enroled in the grade that represents eight 
years of schooling, counting from the first year of ISCED Level 1,1 providing the mean age at the time of 


testing is at least 13.5 years. 


the interna 
teachers) a 
accordingly in the international report. To ensure international co 
to specify their country’s legal school entry age, the name of the 
the mean age of the students in that grade. 


tional population definition but decided to survey th 


Hereafter, the term “students” is used to describe “students in th 


For most countries, the target grade was the eighth grade or its national equivalent. Italy 
eir grade 8 students (and their 
t the beginning of the school year. Due to this deviation, their results were annotated 


mparability, the ICILS 
target grade, and anes 


1 ISCED stands for International Standard Classification of Education (UNESCO 1997). 


followed 


RCs had 
timate of 


e ICILS target population.” 


60 


ICILS 2018 TECHNICAL REPORT 


Definition: Teachers 
ICILS 2018 defined the target population of teachers as follows: 
Teachers are defined as school staff members who provide student instruction through the delivery of 


lessons to students. Teachers may work with students as a whole class in a classroom, in small groups in 
resource rooms, or one-to-one inside or outside of classrooms. 


The teacher target population in ICILS consists of all teachers that fulfill the following conditions: They are 
teaching regular schoo! subjects to students of the target grade (regardless of the subject or the number 
of hours taught) during the ICILS testing period and since the beginning of the school year.? 


School staff from the following categories were not regarded as part of the target teacher 
population (i.e., were out of scope): 


e Anyschool staff that attend to the needs of target-grade students but do not teach any lessons 
(e.g., psychological counselors, chaplains); 


e Assistant teachers and parent-helpers; 


e Non-staff teachers who teach (non-compulsory) subjects that are not part of the curriculum 
(e.g., cases where religion is not a regular subject and taught by external persons); and 


e Teachers who have joined a school after the official start of the school year. 


Hereafter the term “teachers” is used to describe “teachers in the ICILS target population.’ 


Definition: Schools 


In ICILS 2018 schools were defined as follows: 


A school is one whole unit with a defined number of teachers and students, which can include different 
programs or tracks. The definition of “school” should be based on the environment that is shared by 
students, which is usually a shared faculty, set of buildings, social space and also often includes a 
shared administration and charter. 


Schools eligible for ICILS are those at which target grade students are enroled. 


In order to ensure international comparability, the definition of “school” should be equivalent in 
all participating countries. In most cases, identifying schools for sampling purposes in ICILS was 
straightforward. However, there were some cases where identification of schools for sampling 
purposes was more difficult. National centers were provided with the following examples in order 
to help them identify sampling units: 


e Sub-units of larger “campus school” (administrative “schools” consisting of smaller schools from 
different cities or regions) should be regarded as separate schools for sampling purposes. If a 
part of a larger campus school was selected for ICILS 2018, the principal or ICT coordinator 
of the combined school was asked to complete the school questionnaire with respect to the 
sampled sub-unit only. 


e Schools consisting of two administrative units, but have shared staff, shared buildings, and offer 
some opportunities for the students to change from one school to the other, should be regarded 
as one combined school for sampling purposes. 


e The parts of a school with two or more different study programs that have different teaching 
staff, take place in different buildings, and offer no opportunity for students to change from one 
study program to the other, should be regarded as two or more separate schools for sampling 
purposes. The study programs should be listed as separate units on the school sampling frame. 


2 Teachers that are on a long-term leave during the testing period (e.g., maternity or sabbatical leave) are not in scope of 
ICILS. 


SAMPLING DESIGN AND IMPLEMENTATION 


Coverage and exclusions 


Population coverage 


The ICILS consortium encouraged participating countries to include all schools, students, and 
teachers defined in the target populations in the study, in order to ensure a full coverage of these 
target populations. 


However, it was deemed appropriate to exclude some schools, students, and teachers from the 
target population for practical reasons, such as difficult test conditions or prohibitive survey costs. 
Some students and teachers were not surveyed due the removal of their entire school from the 
sampling frame (school-level exclusions) while some students (but not teachers) were excluded 
within participating schools (within-sample exclusions). 


As inother large-scale assessments, it should be emphasized that the ICILS 2018 samples represent 
the nationally defined target populations only (without the excluded members of the internationally 
defined target population). 


School-level exclusions 

Table 6.1 gives an overview of the types of exclusions of schools and their respective percentages 
of all schools in the desired national target population within each participating country. The 
(school-level) percentages ER.,;, were computed as: 


ERsch 


x100 


ER ech = 


sch 


ER,., denotes the number of schools excluded prior to sample selection, and TP,,,, is the total 
number of schools belonging to the national desired target population. The respective figures 
were provided by the NRCs. 


In most countries, very small schools and schools exclusively dedicated to students with special 
needs students were excluded. Frequently, schools following a curriculum that differed from the 
mainstream curriculum were also not part of the nationally defined target population. Because 
school-level data (collected via the principal and ICT coordinator questionnaires) in the ICILS 
2018 survey were only used to complement the reporting of student- and teacher-level data, no 
specific thresholds were determined for exclusions at the school level. However, the percentages 
of students and teachers excluded due to the removal of entire schools were considered when 
determining the overall proportions of students (see below). 


School exclusions differed significantly across countries, a point that should be kept in mind when 
interpreting results from school-level data. Please note also that because school exclusions typically 
concern small schools, the percentages of excluded schools always tend to be higher than the 
corresponding percentages of excluded students or teachers. 


62 


Table 6.1: Percentages of schools excluded from the ICILS target population 


ICILS 2018 TECHNICAL REPORT 


Country Type of exclusion Excluded schools 
(% of all schools) 
Chile Very small schools (fewer than six students) 5.0 
Special needs schools 0.1 
Geographically inaccessible 0.1 
Total Spill 
Denmark Very small schools (fewer than five students) 7.0 
Special needs schools Sell 
Treatment centers 1.6 
German, English, Waldorfs schools LS 
Total 14.9 
Finland Special needs schools 95 
Language schools (instructional language not Finnish or Swedish) 1.0 
Total 10.5 
France Overseas territories (TOM) 1.4 
Mayotte 0.3 
Private schools without contract 8.3 
Specialized schools 0.8 
Total 10.8 
Germany Special needs schools 8.9 
Very small schools (fewer than three students) 1.4 
Total 10.4 
Italy Very small schools (fewer than six students) O22 
Special needs schools 0.1 
Students taught in Slovene 0.1 
Schools in remote geographical area or in little islands 0.1 
Total 0.5 
Kazakhstan Students are taught in Uzbek language 1.2 
Students are taught in Uighur language 0.2 
Students are taught in Tadjik language 0.1 
Students are taught in other language 0.0 
Special needs schools 1.3 
Very small schools (fewer than four students) 5.8 
Total 8.6 
Korea, Republic of Very small schools (fewer than five students) 1.8 
Geographically inaccessible schools 44 
Physical education school 0.3 
Total 6.6 
Luxembourg o exclusions on school level 0.0 
Portugal Very small schools (fewer than seven students) 20 
nternational schools 11 
Total Sal 
United States o exclusions on school level 0.0 


SAMPLING DESIGN AND IMPLEMENTATION 


Table 6.1: Percentages of schools excluded from the ICILS target population (contd.) 


63 


Country Type of exclusion Excluded schools 
(% of all schools) 
Uruguay Special needs schools 0.2 
Geographically remote schools 8.8 
Total 20 
Benchmarking participants 
Moscow (Russian Federation) Special needs schools 25 
Very small schools (fewer than seven students) 48 
Total ES 
North Rhine-Westphalia (Germany) Special needs schools 97 
Very small schools (fewer than three students) 0.2 
Total 9.8 


Student-level exclusions 


Each country was required to keep the overall rate of excluded students (due to school-level and 
within-school exclusions) below five percent (after rounding) of the desired target population. In 
three education systems participating in |CILS 2018 the overall exclusion rate was above five 
percent, which resulted in respective annotations in the ICILS 2018 international report (Fraillon 
et al. 2020). Table 6.1 and Appendix B of this report provide details about the exclusion types 


for each country. 


The overall exclusion rate of students is the sum of the students’ school-level exclusion rate and 
the weighted within-sample exclusion rate. Table 6.2 provides the respective percentages for 


ICILS 2018 countries. 


Table 6.2: Percentages of students excl 


uded from the ICILS target population 


Country Students’ school-level Within-sample Overall 
exclusion (%) exclusions (%) exclusions (%) 
Chile 0.5 0.8 3 
Denmark 3 44 7D 
Finland 1.6 24 4.0 
France 3.4 13 47 
Germany 1.5 2.9 43 
taly 0 29 3.0 
azakhstan 3.4 2.1 5.6 
orea, Republic of 0.9 0.6 1.5 
Luxembourg 0.0 3.9 3.9 
Portugal 0.8 8.0 8.9 
United States 0.0 5.0 5.0 
Uruguay 1.41 0.0 1.4 
Benchmarking participants 
oscow (Russian Federation) 0.7 2.3 3.0 
orth Rhine-Westphalia (Germany) 1.4 oie 4.6 


64 


ICILS 2018 TECHNICAL REPORT 


The student’s school-level exclusions consisted of those students belonging to schools which 
were excluded prior to the school sampling. The students’ school-level exclusion rate ER, was 
calculated as: 


Ey 
ER1= —* x100 
TP 


E, denotes the number of target grade students in excluded schools, and TP is the total number of 
students belonging to the national desired target population. The respective figures were provided 
by the NRCs. The students within-sample exclusions were based on information collected from the 
sample (i.e., after the school sampling step). The within-sample exclusions consisted of students 
with physical or mental disabilities or students who could not speak the language of the test 
(usually, students with less than one year of instruction in the test language). Students could be 
excluded prior to the within-school sampling or after the within-school sampling was performed.* 
The percentage of student within-school exclusions was calculated using the number of students 
excluded within schools and the total number of students belonging to the national desired target 
population. 


The students’ within-sample exclusions ER» were computed as: 


E 
ye et eR eto 


Ws is the sum of weights of excluded students and W? is the sum of weights of participating 
students. Therefore, Ww? + Ws denotes the (estimated) total number of students belonging to 
the nationally desired target population. The students’ school-level exclusion rate is taken into 
account by multiplying by (1 - ER,). 


The overall exclusion rate of students ER; is the sum of the students’ school-level exclusion rate 
and the weighted within-sample exclusion rate: 


ERiot = ER, 7 ER» 


Table 6.2 provides the respective percentages for ICILS 2018 countries. 


Teacher-level exclusions 


Teachers working in excluded schools were not part of the nationally defined target population. 
Within participating schools, all teachers who met the target population definition were eligible 
for participation in the survey. 


Each country was asked to provide information about the total number of teachers teaching in the 
target grade as well as the proportion of teachers teaching in the target grade in excluded schools. 
For Germany, Finland, and North Rhine-Westphalia (Germany), no statistics on the number of 
eligible ICILS 2018 teachers were available and therefore it was not possible to compute exclusion 
rates. Teacher exclusion rates exceeded five percent in Denmark, France, and the United States. 


3. Insome cases, these students were grouped in classes, which were then excluded as a group, or the sample schools were 
found to have only enroled students within the exclusion categories, which resulted in the corresponding school(s) to 
be excluded ex post. 


SAMPLING DESIGN AND IMPLEMENTATION 


School sampling design 


lEA used a stratified two-stage probability cluster sampling design in order to conduct the school 


sample selection for all |CILS 2018 countries. During the first stage, schools were selected 


systematically with probabilities proportional to their size (PPS) as measured by the total number 


of enroled target grade students. 


During the second s 


tage, within participating schools, students 


enroled in the target grade were selected using a systematic simple random sample approach. 


The following subsections provide further details on the sample design 


for ICILS 2018. 


School sampling frame 


In order to prepare the selection of school samples, national centers provided a comprehensive 
list of schools including the numbers of students enroled in the target grade. This list is referred 
to as the school sampling frame. To ensure that each ICILS 2018 school sampling frame provided 
complete coverage of the desired target population, the sampling team carefully checked and 


verified the plausibility of the information by comparing it with official statistics. 


The sampling team required the following information for each eligible school inthe sampling frame: 


e Aunique identifier, such as a national identification number; 


e School’s measure of size (MOS), which was usually th 


target grade or an adjacen 


t grade; and 


e Values for each of the intended stratification variable 


Stratification of schools 


Stratification is part of many sampling designs and enta 
by common characteristics. Examples for such groups of 


would be geographic region, urbaniza 


ICILS 2018 applied two different 


sampling frame has been sorted, prior to sampling, by impl 


asimple and straightforward method 


strata. With explicit stratification, independent samples o 


stratum. Th 


e sample sizes for each ex 


to achieve a fairly proportional sample al 


plicit stratum are assigned before the se 


e number of students enroled in the 


a 


ils the grouping of sampling frame units 
units (schools in the case of ICILS 2018) 


tion level, source of funding, or performance level. 


methods of stratification. Implicit stratification means that the 


icit stratification variables, thus providing 
ocation across all 
from each explicit 


ection process In 


f schools are selected 


order to achieve the desired sample precision overall and, where required, also for subpopulations. 


Generally, | 


They use implicit or explici 
thereby 
national centers identify s 
students’ learning-outcom 


They use explicit stratifica 
of schools. 


t stratification to improve t 


EA studies use stratification for the following reasons: 


he efficiency of the sample design, 


making survey estimates more reliable and reducing standard errors (to this end 


tratification variables expected to be closely associated with 


e variables). 


tion to apply disproporti 


The latter design feature was used if the country required 


specific subgroups of interest inthe target population. 
public and private schools but only 10 percent of the students 


the latter category. In such acase, a proportional sample 


private schools in the sample to provide reliable esti 


onate sample allocations to specific groups 


estimates with higher precision levels for 


For example, acountry may wish to compare 


in that country attend schools in 
on would result in having too few 


allocati 


mates for this particular subpopulation and 


even larger differences between the two different school types might not appear as statistically 
significant. 


66 


To allow comparison 
an appropriately larg 
proportional sample a 
of 10% of students in 


ivate schools could be 
icschools the same. Int 
% in public schools), 


er samp 
location 
private 


e of pr 
for pub 
and 90 


s with sufficient statistical power wit 


hout 


ICILS 2018 TECHNICAL REPORT 


introducing any samp 


mple above (witha distr 


e bias, 


selected while keeping the original 
he exa 
a proportional sample allocation for a 


ibution 


selection 
ove 
number ( 


rsampling, it would be poss 


dleadtoas 
ible to increase 
ting 50 priva 


of 150 schools wou 


for example by selec tea 


Each cou 


experts. Table 6.3 provides details about the s 


ntry applied different stratification 


ample of 15 pr 


the number 
nd 135 pub 


schemes af 


Table 6.3: Stratification schemes of participating countries 


tratification 


iva 


te and 135 public schools. Through 
of private schools to a sufficiently large 
ic schools). 


ter discussions with the IEA sampling 
variables used. 


Country Explicit stratification variables Number of Implicit stratification variables 
(number of variable characteristics) explicit strata | (number of variable characteristics) 
Chile Grade (Grade 8 and 9/Grade 8 10 Performance level (4) 
only) (2) 
Administration (public/private/private 
subsidized) (3) 
Urbanization level (2) 
Denmark one ational achievement score (5) 
Finland Language of instruction (2) 9 Within Western, and Northern & 
Region (Helsinki & Uusimaa/Southern/ Eastern stratum: Region (4) 
Western/Northern & Eastern) (4) Within Swedish speaking strata: 
Urbanization (2) Urbanization (2) 
Within Swedish speaking school: 
Region (2) 
France School administration (3) 18 one 
Urbanization (3) 
Digital equipment level (2) 
Germany orth Rhine-Westphalia/ other 5 SES indicator (3) 
ederal states (2) Federal state (16) 
Track (gymnasium/nongymnasium/ 
special needs schools) (3) 
taly Region (North/Central/South) (3) 3 Administration (2) 
Performance level (5) 
azakhstan Urbanization (urban/rural) (2) 8 one 
Language of instruction (4) 
orea, Republic of Urbanization (3) 9 one 
School gender (3) 
Luxembourg Schools following national curriculum/ 2 one 
Schools following other curriculum (2) 
Portugal Administration (2) 28 one 
Region (25) 
United States Poverty level (2) 12 Urbanization (5) 
Administration (2) Ethnicity status (2) 
Region (4) 
Uruguay Administration (2) 6 one 
Region (2) 
School type (2) 
Benchmarking participants 
Moscow (Russian Federation) | Performance level (5) Administration (2) 
North Rhine-Westphalia Track (gymnasium/nongymnasium/ SES indicator (3) 
(Germany) special needs schools) (3) 


SAMPLING DESIGN AND IMPLEMENTATION 


School sa 


mple selection 


In order to select the school samples for the ICILS 2018 main survey, the sampling team used 


stratified P. 
and 
and 
where all 
reality, ho 


First, note 


aimed for 


teachers can 
guaranteed 


because s 
Finally, sa 
order toa 


School sam 


ae 
acco 
case 


following steps were done independen 


Sorti 


sorted ini 


Selecting all following schools by adding the sampling interval to the ran 
subsequently to each new value every time asch 
MOS was equal or above the value for selection 


thes 


Figure 6. 


cells refle 
second sc 


notably in most | 
disproportional sampling of subpopulations 
ed in the last stage of sa 
osingle ICILS sample was se 


Calculatin 
sampled f 


Determini 


1 visualizes th 
diagram, the schools in 


PS systemat. 


units samp 
wever, n 
th 
se 


atasam 
f-wel 


that t 
ampling 
mples of 
chieve prec 


allocat 
four o 


ple selection | 


Splitting the school 


rding to the de 
all schools 


ng the schools 


rom thate 


ample. 


hool inthe list 


ghted samples 
not be self-weightin 
he samples are not self-weigh 
hardly ever be exactly proportional for different explicit strata. 
ut of 14 countries were disproportionally allocated to explicit strata in 
ise estimates for su 


icsampling. This method is custo 
EA surveys. Under ideal condit 


pling design can only be sel 
of students. | 
g. Second, for 


ion can 


nvolved the foll 


fined explicit st 


from the school sampling frame were 


by implicit strat 
ncreasing and decreasing order). 


g a sampling interval by dividing the 


xplicit stratum. 


ng arandom starting point, a step that 


sampling frame containing all eligible sc 


this method would lea 
mpling have a similar 
f-weighting. 


f-weighting for one t 
turn, and by design, 


n 


ting because explicit 


bgroups.* 
owing steps: 
rata (unless no explicit strat 


kept in on 
tly within each exp 


a and within each implicit s 


total MOS by t 


ool was selecte 
, the correspon 


e process of systematic PPS sampling within an exp 
the sampling frame are sorted descending by MOS, a 
cts the number of target-grade students in each school. A random start determines the 
for selection, and aconstant sampling interval determines the nex 


schools. Cells with sampled schools are shown as shaded. 


Joncas an 
an illustra 


tive example.° 


d Foy (2012) provide a more comprehensive description of the sampling process, usi 


4 See Chapter 7 for more details on sampling weights. 
5 The corresponding information can be accessed in the file “TIMSS and PIRLS Sampling Schools” in “Sample Design 
Details” at https://timssandpirls.bc.edu/methods/pdf/Sampling_Schools.pdf 


icit stratum. 


determines the first sa 


mary inmost large-scale surveys in education, 
ions (that is, in the absence of non-response 


dto “self-weighted” samples 
probablility of selection. In 


arget population, and ICILS 
the samples of schools and 


most countries, the implemented design actually 


stratification was used and 


hools into separate frames 


ification was used, in which 
e explicit stratum). All of the 


ratum by MOS (alternately 


he number of schools to be 


mpled school. 

dom start and then 
ever the cumulated 
ool was included in 


d. When 
ding sch 


icit stratum. In this 
nd the height of the 


t sampled 


68 


ICILS 2018 TECHNICAL REPORT 


Figure 6.1: Visualization of PPS systematic sampling 


Random start 
150 students 


120 students 


Sampling interval 
100 students a 


80 students 


75 students 


70 students 


65 students 


62 students 


60 students 


58 students 


55 students 


53 students 


52 students 


Source: Zuehlke 2011. 


Insome cases, the sampling design deviated from this general procedure: 


e Small schools were selected with equal selection probabilities to avoid large variations of 
sampling weights due to changing size measures. Usually a school was regarded as “small” if 
the number of enroled target grade students was less than 20. 

e Very large schools (i.e., schools with more students than the value of the sampling interval) 
were placed into a separate explicit stratum and selected with certainty (i.e., all schools in this 
category were included in the sample). 


All countries conducted a field trial with a small sample of schools one year before the ICILS main 
survey. Where possible it is preferable that any given school is not selected to participate in both 
the the field trial and the main survey data collection. This is because selection in both the field 
trialand main survey may reduce the likelihood of aschool to agree to participate and also because 
there may be some information sharing within a school about the contents of the instruments that 
could influence the data collected in the main survey. In ICILS we prevented this by selecting the 
sample of schools to participate in each of the field trial and main survey simultaniously (as part of a 
single larger sampling procedure) in each country. For example, if a country was planning to sample 
25 schools in the field trial and 150 schools in the main survey then a single combined sample of 
175 schools, was selected first. Then from these schools the main survey sample of 150 schools 


was subsampled, leaving the remaining (25) schools for the field trial sample. 


SAMPLING DESIGN AND IMPLEMENTATION 


ICILS 2018 was conducted in the same year as the OECD's Teaching and Learning Internationa 
ional Student Assessment (PISA) 2018, and some 
rends in Germany). Several countries requested 
udies only to reduce the administrational burden 
ollaborated closely with the staff implementing 
nd for PISA 2018 (at Westat) to prevent schoo 
ures used to prevent school overlap ensured 
probabilities, and consequently unbiased 


Survey (TALIS) 201 


that schools shou 


for 


sam 


samples for al 


Once schools 


2018 was limited 


due to replace 


th 
sampling for TAL 
ple overlap whenever possible 
randomness of 


had been selected fro 
schools were assigned for each originally sampled sc 
see Chapter 7 for more details). In 
ment, the replacemen 
appeared directly after a selected school in the sor 
replacement, while the preceding sch 


This ensured 


S 2018 (at Statisti 


18 and Programme for Internat 
national education surveys (e.g., the educational t 
d be selected for one of these st 
e schools. The ICILS 2018 sampling team c 
cs Canada) a 
. The proced 
ection, known (correct) school selection 
of these studies. 


m the sampling 


t schools were u 


00 
that replacement schools shared sim 


frame, Up 


NOO 


ilar chara 


sually assi 
ted sampli 


sampled schools they belonged to, were of similar size, and be 


Within-school sampling design 


Within-school sampling constituted the second stage of 
their appointed data managers carried out the selection of 
WinW38S software, developed by | 
the sampled schools. Replacemen 


Student sampling 
EA WinW3S employed systemati 


year prior to sampling. 


Teacher sampling 


As was the case for student sampling, | 


EA, ensured the random 
t of non-responding ind 


c stratified sampling with equal selection probabi 
students from comprehensive lists of target grade students provided by the particip 
moplicit stratificiation was applied to ensure a nearly proportional allocation amo 
and to increase sample precision. Thus students were sorted by gender, class alloca 


to two (non-sampled) replacement 
. The use of replacement schools in ICILS 
order to reduce the risk of non-response bias 
gned as follows: The school, which 
ng frame, was assigned as its first 
of the sampled school was used as its second replacement. 
cteristics with the corresponding 
onged to the same stratum.® 


the ICILS sampling process. The NRCs or 
students and teachers. The use of the IEA 
selection of students and teachers within 
ividuals was not permitted in ICILS 2018. 


ities to select 
ating schools. 
ng subgroups 


tion, and birth 


EA WinW23S employed systematic stratified sampling with 


equal selection probabilities to select teachers from comprehensive lists of in-scope teachers 
provided by the participating schools. The procedure also ensured a sample allocation among 
subgroups that was near to proportional. To increase sample precision, teacher lists were 
implicilty stratified by sorting them according to gender, main subject domain, and birth year 
prior to sampling. 


6 For very large schools (see above) it is not always possible to assign two replacement schools as the preceeding school 
or the school directly listed after the sampled schools is either another sampled school or a replacement school 
assigned to another sampled school. In extreme cases, no replacement school can be assigned. This could also happen 
if the number of schools to sample from a stratum is very high compared to the number of schools in this stratum. If 
the sampled school is the first or last in its stratum, usually the two following and preceeding schools are assigned 


respectively. Please note further that for the field trial only one replacement school was assigned. 


70 


ICILS 2018 TECHNICAL REPORT 


Sample size requirements 


The ICILS 2018 consortium, in line with practice in other IEA studies, set high standards for 
sampling precision and aimed to achieve reasonably small standard errors for survey estimates. 
The student sample should ensure a specified level of precision for population estimates; 
defined by confidence intervals of +0.1 standard deviation for means, and +5% for percentages. 
With respect to the main outcome variable in ICILS 2018-that is, the students’ computer and 
information literacy (CIL) score scale established in ICILS 2013 with a mean of 500 score points 
and a standard deviation of 100 for equally weighted national samples from the first cycle-this 
requirement translated into standard errors that needed to be below five score points. IEA was 
responsible for determining sample sizes that were expected to meet these requirements for 
each participating country. With the exception of one participating country (Unites States), which 
failed to meet the IEA sample participation standards, all participating countries and educational 
systems achieved this requirement (see Fraillon et al. 2020, p. 75). The required precision levels 
of percentages were also met for the vast majority of population estimates presented inthe ICILS 
international report (Frallion et al. 2020). 


There were also other consideration that needed to be taken into account when determining 

the required number of sampled students and teachers: 

e Some types of analysis, like multilevel modeling, require a minimum number of valid cases at 
each sampling stage (see, for example, Meinck and Vandenplas 2012); 


e For the purpose of building scales and sub-scales, a minimum number of valid entries per 
response item is required; and 

e Reporting on subgroups (e.g., age or gender groups) requires a minimum sample size for each 
of the subgroups of interest. 

All these considerations were taken into account during the process of defining minimum sample 

sizes for schools, students, and teachers. 


School sample sizes 


The minimum sample size for the ICILS main survey was 150 schools for each country.’ In some 
countries it was necessary to select more schools than the minimum sample size due to one or 
more of the following reasons: 
e Previous student surveys had shown a relatively large variation of student achievement between 
schools in acountry. In these cases, it was assumed that the IEA standards for sampling precision 
could only be met by increasing the school sample size. 


e The number of schools with less than 20 students in the target grade was relatively large so 
that it was not possible to reach the student sample size requirements by selecting only 150 
schools (see next section below). 


e The country requested oversampling of particular subgroups of schools to accommodate 
national research interests. 


Student sample sizes 

Typically 20 students were randomly selected from the full target grade cohort (i.e., across all 
classes) in each sampled school. In schools with 25 or fewer students in the target grade, all 
students were selected.® 


7 Luxembourg conducted a census of schools, i.e., all 41 school were asked for participation. 

8 Inthe Unites States a minimum of 30 students were randomly selected, because students to be excluded could only 
be identified after the within-school sample was conducted. In Luxembourg, a census of students was used. Thus all 
students were selected. 


SAMPLING DESIGN AND IMPLEMENTATION 


Each country was required to have an achieved s 
students. Due to non-response, school closures, or other factors, some countries did not meet this 
requirement. The ICILS 2018 sampling team did not 
as the country met the overall participation rate requ 


Teacher sample sizes 


4 


Typically 15 teachers of the target grade were random 


with 20 or fewer teachers of the target grade, all teachers were selected.’ 


In summary, the minimum sample size requirements 


e Schools: 150 in each country 


e Students: 20 (or all) per school 


e Teachers: 15 (or all) per school 


for ICILS were as follows: 


tudent sample size of about 3000 tested 


regard this outcome as problematic as long 
irements (see Chapter 7). 


y selected in each sampled school. Inschools 


Table 6.4 lists the intended and achieved school sample sizes, the achieved student sample sizes, 


and the achieved teacher sample sizes for each participating country. Note that schools may have 
been treated as participating in the student survey but not in the teacher survey and vice versa 
due to specific minimum within-school response rate requirements. This explains differences in the 


numbers of participating schools for the student and teacher survey across ICILS 2018 countries.*° 


Table 6.4: School, student, and teacher sample sizes 


71 


Country Originally Student survey Teacher survey 
sampled TEN war Sane ee 
adnarle Participating Participating Participating Participating 
schools students schools teachers 
Chile 80 178 3092 174 686 
Denmark 50 4 2404 138 118 
Finland 50 4 2546 143 853 
France 56 156 2940 122 462 
Germany 234 209 3655 182 2328 
taly 50 150 2810 148 775 
azakhstan 86 183 3371 184 2623 
orea, Republic of 50 150 2875 147 2127 
Luxembourg 41 38 5401 28 494 
Portugal 220 200 3221 208 2823 
United States 352 263 6790 259 3218 
Uruguay 177 166 2613 171 1320 
Benchmarking participants 
oscow (Russian Federation) 150 150 2852 150 2235 
orth Rhine-Westphalia (Germany) 115 109 1991 107 1468 


9 \|n Luxembourg, the minimum sample size was increased to 25 teachers per school, due to the small number of schools. 
10 Please refer to Chapter 7 for details on ICILS 2018 standards for sampling participation. 


ICILS 2018 TECHNICAL REPORT 


Efficiency of the ICILS 2018 sample design 

As already noted, ICILS 2018 determined specific goals in terms of sampling precision, especially 
that standard errors should be kept below specific thresholds. Let us illustrate this concept insome 
more detail for readers who are not as familiar with this topic. 


In any sample survey, researchers would like to use data collected from the sample to get a good 
(or “precise’) picture of the population from which the sample was drawn. However, there is aneed 
to define what is “good” in terms of sampling precision. Statisticians aim for a sample that has as 
little variance and bias as possible for specific design and cost limits. A measure of the precision is 
the standard error. The larger the standard error, the more “blurred” the picture is, and inferences 
from sample data to populations become less reliable. 


Let us assume our population of interest is the left-hand picture in Figure 6.2 below, the famous 
picture of Einstein taken by Arthur Sasse in 1951. The picture consists of 340,000 pixels. We 
can draw samples with increasing numbers of pixels from this picture and reassemble the picture 
using only the sampled pixels. As can be seen in the middle and right-hand pictures in Figure 6.2, 
the picture obtained from the sampled pixels becomes more precise as the sample size increases. 
The standard errors from different samples sizes are equivalent to reflections of the sampling 
precision in this example. 


Figure 6.2: Illustration of sampling precision—simple random sampling 


Picture = Population Sample size = 10,000 Sample size = 50,000 


Determining sampling precision in infinite populations is relatively straightforward as long as 
simple random sampling (SRS) is employed. The standard error of the estimate of the meant from 
asimple random sample can be estimated as: 


O; = OF 
n 

with o* being the (unknown) variance in the population and n being the sample size. If the variance 
inthe population is known, the sample size needed for a given precision level can be easily derived 
from the formula. For example, assuming the standard deviation o of an achievement scale to be 
100, the population variance o? would be 10,000, and the standard error of the estimated scale 
mean oy will equal five scale score points or less. Rearranging the formula above leads then to a 
required minimum sample size of 400 students per country. As pointed out earlier, however, the 
actual minimum sample size for participating countries in ICILS 2018 was 3000 students. 


SAMPLING DESIGN AND IMPLEMENTATION 


The key reason for this sample size requirement is that ICILS 2018 did not employ SRS sampling 
but cluster sampling. Students in the sample are members of “clusters” as groups of them belong 
to the same schools. 


Students within a school tend to be more similar to one another than students from different schools 
because they are exposed to the same environment and teachers. Furthermore, they also often 
share common socioeconomic backgrounds. Therefore, the gain in information through sampling 
additional individual students within schools is less than when sampling additional schools, even 
if the total sample size is kept constant. In other words, due to the homogeneity of students in the 
same schools the sampling precision of cluster samples with similar sample sizes tends to be less 
than when applying SRS. 


For this reason, the SRS formula given above is not applicable for data from cluster samples. In 
fact, and depending also on the outcome variable beeing measured, applying this formula will most 
likely underestimate standard errors from cluster sample data by a considerable margin. Figure 
6.3 visualizes this effect through the example of the Einstein portrait where the number of pixels 
sampled is the same in both pictures. However, in the right-hand picture, clusters of pixels were 
sampled rather than single pixels as in the left-hand picture. 


Figure 6.3: Sampling precision with equal sample sizes—simple random sampling versus cluster sampling 


Note that stratification also has an influence on sampling precision. Through the choice of 
stratification variables related to the outcome variables it is possible to increase the sampling 
precision compared to non-stratified samples. However, experience shows that in large-scale 
assessments in education the impact of stratification on sampling precision tends to be much 
smaller than the effect of clustering. Stratification is another reason as to why the SRS formula 
for estimating sampling variance is not applicable for ICILS 2018 survey data. 


Because of the above reasons, estimation of sampling variance for complex sample data is not as 
straightforward as it is for simple random samples. Chapter 13 of this report explains in more detail 
the jackknife repeated replication (JRR) method which should be used for a correct estimation of 
standard errors for ICILS 2018 data. 


The achieved efficiency of the ICILS sampling design is measured by the design effect as: 
Var 


JRR 


deff = 
eff Vl <p 


where VAR... is the design-based sampling variance for a statistic estimated by the JRR method, 
and VAR,,. is the estimated sampling variance for the same statistic on the same data base but 


considering the sample as asimple ran 


sample size of the variable of interest) 
previous surveys with the same or at 
adesired sam 


CILS 2018 re 


had been app 
estimated ate 
of the desired 
for stratification and clustering. 


ied. So, for example, in 
ight, multiplying this nu 


Table 6.5 to Table 6.7 provide the design effects of th 
dents and for teachers in each participating cou 
zes and design strategies in future surveys with simi 
es but this is not unusual, since the si 
should be higher when asking about school-related matters rather than, for example, about how 
f Table 6.6 and Table 6.7 provides estima 


stu 
S 
d 


ifferent sca 


fre 
Oo ‘a 


quently th 
the effecti 


ey use ICT as ind 
ve sample sizes. 
th 
th 


n Table 6.5, we can see that 
CT scale, while in Table 6.6 
th 
significantly exceeded the en 


fi 


The average design effect of 


quired an “effective” sa 
studies of education, the “effective” sample si 
needed to achieve the same samplin 


mber by 


sample size for the next survey, assumi 


ividuals. The last column o 


e effective sample si 


domsamp 
“lfwecan 
east equ 
ple size for a cluster sample desi 


mple size o 


g precisi 


en. 


ZE 
on 


milari 


ICILS 2018 TECHNICAL REPORT 


e ICILS 2018 main outco 


lar objectives. The design 


the sa 


me Va 


effects vary 


ty of students and teache 


ZEre 


saged effective sample size of 4 
teacher sample size was also estimated at 400 or above ina 


the ClL scale is equal to 2.6. Most 
survey showed even lower design effects. The average design 


| ICILS 2018 countr 


ies. 


me desi 


riables f 
ntry. This information helps to determine sam 


e (with replacement, conditional on the achieved 
estimate the design effect ina given country from 
ivalent outcomes variables, we can also determine 


f 400 students. Within the context of large-scale 
is an estimate of the sample size that would be 
of a cluster sample if simple random sampling 

a country where the design effect in a previous survey was 

the effective sample size would provide an estimate 


ng that the samples apply 


for 


rs within schools 


CES 


ates to the design effect of the ClL and the 
e effective sample size relates to the average design effects across 
e presented scales. As is evident from Table 6.5, all national samples achieved or, more often, 
100. Table 6.6 shows that the effective 


other scales pertaining to the student 
effect of the CT scale is 2.0 and tends 


to be lower than the CIL scale design effect in most ICILS 2018 countries that administered the 


optional CT assessment. The teacher-related scales had an 


ICILS 2018 countries. 


average design effect of 2.6 across 


11 The measurement error for the CIL scales is included in VAR,,,.. Chapter 13 provides further details on measurement 


error estimation. 


75 


SAMPLING DESIGN AND IMPLEMENTATION 


ousaqU! BY} WO} JUSJUOD BuIssadde 4JOJ |) JOE  LNODOV'S ssejo ul Suoeai|dde jesauas Joasm) SSVIDNIO S 
sasodind Apnys Jof || Joes = -GLSASN'S ssejd ul Suolqeaidde ysijelnads Joasp) SSVWIDIdS S 
UO!PEUWIOJU! BUISUBYOXA JO} | D| JO9SN 4NISSN S AYalDOS JO} | D| JO SUOIWdad49d SAITISOd SOdLDIS 
UO!ESIUNWWOD [BIOS JOJ |D|JOAS = =WODASN S Aja190S JO} [D| JO suUOIWdaosad aAl}e89 SJANLOIS 
sea u] aualJadxe ja/qe| AVLXa S Apnjs pue JOM JOJ ASN | DJ Bunny Jo suoiepadx3 LNsLors 
sseaA ul aoualiedxe auoydyews | YVWSX3S JOOYDS Je SYS} | D] Jo Sulusesq NW1LOIS 
sJead Ul aualedxa JajndWwoy = dIWOODX S JOOYDS Je SySe} SuIPOd |D] JoSulea] ~=NYTIGODS 
SalPAlIe JOJ sUO!Je!|dde jesauas Joasy) = LOVWN3AO S suolqeaidde jesauas Jo asn ay} Bulpsesa4 AdedyJa-JJas | D 4445N359S 
SalPAQIe 104 SUd!jedI|dde ysijelnads joasr) =| OWOAdS S suoljzeal|dde ysijeloads jo asn ay} Bulpsesas AdeOWYa-JJaS [D9] 44q0gdS S 
S3409S 3|e9S 
OF VC O¢ 61 OE iS ST WG iS) LE Oe GG 8ST ca OL Ore EG Sar ZY 9CEE | a8esane TOT STIDI 
(Aueuuad) eleydisa\\ 
OT om 8T eT 60 60 Ve et et 6C LY ST TT OT oT LT VT OT L661 “OUIYY YO 
(UOIeJape+ UeISSNY) 
o? VC eT VT a OT 0? So vt 9E 9C ST oT 4 VT C OT oc C582 MOISO 
squedioiqied 8uiysewuyouag 
Ve car Vc VC oT Cl ST TL ST ITC LT ST VT OT as st OT 67 ELIS Aensnin 
OT Tv BT om LY CT 0? Sv LV Le 8c oC OT 61 6C ST 6T VC 0629 soqe4s peur 
Ve 6C 0? Vc 61 oT 61 VC ST Ve eC x4 ra om LE CC Ve So LoCE jesnqyod 
80 80 OT 90 90 ZO ZO 80 ZO 60 90 OT va om 80 90 80 90 LOvs S.INOQUIAXT 
VT 6C 0? GT oC oT VT st VC ce O¢ VC oa O¢ 67 SC 0? ST SL8C Jo 1|GNday ‘eal0 
oS 9S vv BV 6C ST Se cv Oe V9 VS So vT eC 6C se SC Ve TLEE UeISU|eZe 
OT 61 omy OT CT OT vT 0? IT VC BT om vT om VC eC oT oT OL8é Ale} 
Te 8c oC Oe ST LT oC 0? 67 Te eC oC VC 8ST 8c CC 0? LY SS9E Aula) 
Vc LY VT 61 Co el ST cL ST 0? oT é6T ST TT oT ST oT OT Ovéc ooUes{ 
om 6T VI S oT OT Li 61 TT LC ST et et LY oT VC 6b ST 9VSE pue|uls 
8 0? él tom OT OT oT So oT ee VC ct VT vt ST et cL OT vOve 4EWU3q 
61 0? Oe LC GG Te VC Ve 6€ VE 6C VC oT LE Ve 6C Ve Ve C60E a!4D 
Zz a i = 2 S 5 o B 3 n ) = a EE ir! 
SW ee ee ee Pe ee ee [oa ee ia | = ye 
a nm nm i a O o oo } f f i a 
eS |e ee nes) een (tee |e nel rey coos es ul eee eS Mele) (set aac sarin: 
uw! uw! ue wl ua uw! uw! aay wa uw! a! uw! us wa oy ou! wR us! 
JOoUds pue aWoY je | D| UM JUaWAaZesU— pur Jo asN ,sjUaPN}s 0} BUIUIe}Jad SaJOIS a]edS BUISN S}DdJJo USISAq ies Auquno> 


Aains JuapnysS—Ssa|qDLIVA aluogjzno ulowW fo syIafJa UBISA :S°9 a|qvL 


76 


ICILS 2018 TECHNICAL REPORT 


Table 6.6: Design effects and effective samples sizes of mean of scale scores and plausible values—Student 


survey 
Country Sample Design effects using: Effective sample size based on: 
Blze Mean of CIL cr Mean of CIL ey 
scale scores plausible plausible scale scores plausible plausible 
values 1-5 values 1-5 values 1-5 values 1-5 

Chile 3092 2.7 41 N/A 1128 761 N/A 
Denmark 2404 1.6 18 15 1515 358 1611 
Finland 2546 1.6 25 2.2 1589 037 1157 
France 2940 1.6 1.8 1.5 1847 595 1899 
Germany 3655 23: 3.1 3.0 1578 186 1232 
taly 2810 1.6 2.3 N/A 1758 247 /A 

azakhstan 3371 3.6 5.4 N/A 934 622 /A 

orea, Republic of 2875 22 2A 2.9 1325 368 980 
Luxembourg 5401 0.8 0.8 0.6 6445 6779 8447 
Portugal 3221 241 29 2.3 1517 099 424 
United States 6790 23 2.6 2.6 2984 2649 2637 
Uruguay 2613 1.7 3.1 N/A 1574 843 /A 
Benchmarking participants 

oscow (Russian Federation) 2852 1.9 2.2 N/A 1490 1283 JA 

orth Rhine-Westphalia 1991 1.4 1.8 15 1428 1097 1293 
(Germany) 
ICILS 2018 average 3326 2.0 2.6 2J0) ADE HOS, 2298 


Note: N/A = not available. 


77 


SAMPLING DESIGN AND IMPLEMENTATION 


suossa| SuLNP ASN |D] YIMadoUalJedxe [5 SFXTL 
JOOYDS }€ SAdINOSII 49}NAWODS Jo ApIGe|lEAY ONSaY L Ssejd Ul Salqiqedes | D| uo siseudwiy dWALdI 
LD] 0} pazejas juawidojanap |euolssajojd BulLe9| jed0J1d19aJ Ul UoedioyJed Jaysea| = DAYAONd L SSe|D UI SSE] SUIPOD SUIYIed} Jo sISeUdWa JaUudea, «= qINAGOD L 
1D] 0} payejay Juswidojanap jeuolssajoid suluses| painjonys Ul UoHediaysed sayses} = YLS4ONd | SAIJAIDE WOOISSE|D JOJ |DIJOASN — LOWSW1D7 
15] 8ulsn ul suaysea} UsaMmyeq UO!}eIOgGe||OD 9109 | ssejo ul Saoiqoeid BuUIYyoeds} JO} | 9] Jjoasn WHYdLOI 
BulUse9| Pue SUIYIEd} Ul | D] SUISN UO SMAIA BAI}SOq SOdMA L ADEDUJO-JJOS | D] SJayoead 445575 
BuUlUJe9| pue SUIYIed} U! | D| SUISN UO sMalA BAeSaN SSNMA L auemyos Ayn jessuas joas~) == TLLA ASN” 
suossa| suledatd JoJ asn | D| YWM adualjadxe | 5] daudxXJ 1 s]00} Sulusea| [eUSIPJoasA) ~=JOOLASN” 
$9.109S a[edS 
82 97 ie vz | 6e ez tz | ee ee | ee ae | er | re | ee | re ee) ee | Seer || Seessreetez sai 
(Aueutas)) eljeydysa/\\ 
86L ST oe | vt | eo |. re | 2b) et | eo | ae }-en | eo) ep et | ve.) oe) et leon -oulyy UO 
(UOHeJape+ UeISsNY) 
8E8 LE Tv Co ee VC VC 8S CE VC oT om VC 61 VC 9E ev SECS MOISO 
squedioiqied 8ulysewyouag 
C6 vt BC LY oT 60 vt bik oT GT OT TT LY ST ST 60 VT OCET Aensnin 
968 GE 6v LY VE 6€ TE VC LE GE So Tv GE eC LE 9E vv BLCE So}e4s Peylu/ 
OTL SC Sv OC OC GG TC LC eC LE LY CL VC 6C LC 8S eS EC8BC je3nq40d 
6EV TT 60 oT CL ST 80 LT cL ZO et oT cL OT OT 60 OT vet B.INOquIaXN) 
668 9S 8S VL CE LY OC 6C 61 eC aA IC oS EG OC Te Ve LOVE Jo aijqnday ‘esto 
ely eg SOL v9 v8 LS 6€ LS LV ev VS Sv SOL e9 SC LZ L6 ECICS ueysypjeze 
168 OC CE 67 CC OC 61 Lb GG LV oT eT TE GS vl ST BT SLLT A\e} 
6L9 ve Ge CV OV CV ce 9 Cv ve Le CC CE LG ce Ve 61 BES AUeWWIaS) 
vO8 BT 9E 61 OC VT ST OC oT VC BT CL oT 61 VL ST VT TOVT aoues4 
SSTL oT ce ST CC eT TT VL eT 61 LY PE OC et eT GL VT ES8L pue|ul4 
TES VC oo cL 9S VC 6T GL vt Ae ST Lb ce WA 67 BT ST SILL JEWUSq 
LV9 LG 89 oT OG EG O€ VC Ov OC O€ O€ OT? VC OC oT LC 9891 oIYD 
O | x 2 |}/b& |u al are 
Se Se eee ere | ss ene le es ee eS 
S ©) ‘@ = = & a a = a a a ta im iy 
azis ajdwes if a & (e) = = Se x 5 o) ss 5 5 a a 
aAIe palje usisaq a al al fa 7 ail ‘al el al ei a al ail =| =| 
jooups pue awoy ze 15] YIM azis 
$aJ09S ajeds Jo uea| juawa8e8ua pue Jo asn ,s}uapn}s 0} Sulule}4ad Sa40ds ajeds Suisn s}ajjo USISAq ajdwes Ayyunod 
AaAINS JAYID2|—S3|qDLIDA awoIINo UIDLU Jo s}JafJa USISAG :Z°9 a|qD] 


78 


ICILS 2018 TECHNICAL REPORT 


References 
Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Duckworth, D. (2020). Preparing for life in a digital world. IEA 
International Computer and Information Literacy Study 2018 international report. Cham, Switzerland: Springer. 
https://www.springer.com/gp/book/9783030387808 
Meinck, S. (2015). Sampling design and implementation. In J. Fraillon, W. Schulz, T. Friedman, J. Ainley, & 
E. Gebhardt (Eds.), International Computer and Information Literacy Study 2013 technical report (pp. 67-86). 
Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement (IEA). 
Joncas, M., & Foy, P. (2012). Sample design and implementation. In M.O. Martin, & IV.S. Mullis (Eds.), Methods 
and procedures in TIMSS and PIRLS 2011. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, 
Boston College. 
artin, M.O., Rust, K., & Adams, R. J. (1999). Technical standards for IEA studies. Amsterdam, the Netherlands: 
nternational Association for the Evaluation of Educational Achievement (IEA). 


einck, S., & Vandenplas, C. (2012). Evaluation of a prerequisite of hierarchical linear modeling (HLM) in 
educational research - The relationship between the sample sizes at each level of a hierarchical model and 
the precision of the outcome model. ERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 
Special Issue 1, October 2012. 

UNESCO. (2006). International Standard Classification of Education: ISCED 1997. Montreal, Canada: UNESCO 
nstitute for Statistics. 

Zuehlke, O. (2011). Sample design and implementation. In W. Schulz, J. Ainley, & J. Fraillon (Eds.), [CCS 
technical report. Amsterdam, the Netherlands: International Association for the Evaluation of Educational 
Achievement (IEA). 


CHAPTER 7: 


Sampling weights, non-response 
adjustments, and participation rates 


Sabine 


Tieck 


Introduction 


One major objec 
teachers and schools within each country (or edu 
and internationa 
ive, It is im 


object 


treat data as if th 


One particularly 


selection 


and non- 


estimation (or “final’) we 


basedon 


should account for this i 


This chapter is largely 
Cortes 2015). It descri 
considered “participan 


probab 


ight for eachsam 
weighted data. Anyone conduc 
n their analyses. 


based on Chap 


ts. 


important feature of a complex sa 
ilities. INICILS 2018, thi 
and schools. Furthermore, non-parti 
patterns of non-response increase this 


mp 


risk. To accoun 
response adjustments were computed within each participating country, leading to an 
pled unit. All findings presented inICILS 2018 reports are 
ting secondary analysis to report on 


feature of the study’s complex sample design 
n complex samples may not provide unbiased estimates of 
he particular features o 
e random sample. 


potential to bias re 
t for these complexi 


ter 7 of the ICILS 2013 technica 
bes the conditions under which students, teachers, and schools were 
Descriptions of how the several sets of weights and non-response 


e is that sampling units do not have equa 
s characteristic applied to the samp 
cipation has the 


tive of ICILS 2018 was to make use of data collected from samples of students’ 
cational system) to make accurate, precise, 
ly comparable estimates of population characteristics. In 
portant to consider the particular 
(see Chapter 6 for details). Data fron 
the corresponding population if analysts disregard t 
ey were from a simp 


order to achieve this 


f complex samples and 


ed students, teachers, 
sults, and differentia 


ties, sampling weights 


population estimates 


report (Meinck and 


adjustments were computed follow. Please note that Chapter 13 of this report covers the use of 


the jackknife replication 


method (needed for variance estimation). Subsequent sections describe 


the computation of participation rates at each sampling stage, and the minimum participation 
requirements. The ICILS 2018 research team regarded response rates as an important indicator 
of data quality and the achieved quality of sample implementation for each country is presented. 


Types of sampling weights 
The ICILS 2018 final weights are the product of several weight components. Generally, it is possible 
to discriminate between two different types of weight components: 


e Base (or design) weights: t 


separately for each sam 
hapter 6 for det 
ection probabilit 


(see C 
these 


e Non-response adjustments: these aim to compensate the potential for bi 


a 


the responding 


hese reflect selection probabilities of sampled units. They are computed 
pling stage and therefore account for multiple-stage sampling designs 
ils). The base weight of a sampled unit is the inverse of the product of 
ies at every stage. 


as due to non-participation 
ly for each sampling stage. 
is weight component is that the (base) weight of 


the non-respondents 
units in that cell. Such 


features. For example, all 


private schools in a given region could comprise a stratum of schools, from which a sample of 


of sampled units. As with base weights, they are computed separate 
The main function of th 
within a specific adjustment cell is redistributed among 
an “adjustment cell” contains sampling units that share specific 
schools was selected. If some of the sampled schools refused to parti 
(partici 

1 For furt 
Canada (2003). 


cipate, then the remaining 
pating) schools in this stratum would carry the (base) weight of the non-participating 


her reading on the topic, we recommend Meinck (2015), Rust (2014), Groves et al. (2004), and Statistics 


80 


ICILS 2018 TECHNICAL REPORT 


schools. This approach allows us to exploit the (usually) little information we have available about 
respondents and non-respondents, and to assume that school non-participation is associated 
with the different strata (see also Lohr 1999). The approach also assumes a non-informative 
response model, implying that non-response occurs completely at random within the adjustment 
cell (i.e., in ICILS 2018 within a stratum). 


Calculating student weights 
School base weight (WGTFAC1) 


The first sampling stage involved selecting schools in each country; the school base weight reflects 
the selection probabilities of this sampling step. When explicit stratification was used, the school 
samples were selected independently from within each explicit stratum h, with h =1,..., H. lf no 
explicit strata were formed, the entire country was regarded as being one explicit stratum. 


Systematic random samples of schools were drawn in all countries, with a selection probability 
of school j in stratum h proportional to its size (PPS sampling). The measure of school size M,, 
was defined by the number of students in the target grade or an adjacent grade. If schools were 
small (M,,<20), the measure of size M,, was redefined as the average size of all small schools in 
that stratum. In a few countries, equiprobable systematic random sampling (SyRS) was applied 
in particular strata. 


The school base weight was defined as the inverse of the school’s selection probability. For school 
iin stratum h, the school base weight was given by: 


M 
WGTFAC1,.= a for PPS sampling and 


N 
WGTFAC1,,= >3 for SyRS 


h 


where n; is the number of sampled schools in stratum h, M, is the total number of students enrolled 
inthe schools of explicit stratum h, M,, IS the measure of size of the selected school i, and Nis the 
total number of schools in stratum h. 


School non-response adjustment (WGTADJ1S) 


School base weights for participating schools needed to be adjusted to account for the loss in 
overall sample size from schools that either refused to participate or had to be removed from 
the international dataset due to low within-school participation. Adjustments were calculated 
within non-response groups defined by the explicit strata. A school non-response adjustment was 
calculated for each participating school i within each explicit stratum h as: 


se 
h 

n p-std 
h 


WGTADJ15,,= 


where nis the number of sampled eligible schools and nis the number of participating schools 
whether originally sampled or replacement schools) in the student survey in explicit stratum h. 


The number n;* in this section is not necessarily equal to n; in the preceding section, as n°" was 
restricted to schools deemed eligible in ICILS 2018. Because of the lapse of one or two years 
between school sampling and the actual assessment, some selected schools were no longer eligible 
for participation. This happened if schools had been closed recently, did not have students in the 
target grade, or had only excluded students enroled. Ineligible schools such as these were not 
taken into account when calculating the non-response adjustment. 


SAMPLING WEIGHTS, NON-RESPONSE ADJUSTMENTS, AND PARTICIPATION RATES 


Student base weight (WGTFAC3S) 


The IEA Windows Within-School Sampling Software (IEA WinW3S) was used to manage within- 
school sampling during the second sampling stage (see Chapter 8 for details) in order to conduct 
a systematic random selection of students from the target grade. The student base weight for 
student k was calculated as: 


M, 
WGTFAC3S,,= —" 
j m 


hi 


where M,, is the total number of students in the target grade in school iin stratum h and m,, is the 
number of sampled students in school i in stratum h. 


In schools with fewer than 26 target grade students, all eligible students were selected for 
participation. In these cases, the weight factor was set to a value of one.” 


Student non-response adjustment (WGTADJ3S) 


Unfortunately, not all selected students were able or willing to participate in |CILS 2018. To account 
for the reduction in sample size due to within-school non-participation, a student non-response 
adjustment factor was introduced. Given the lack of information about absentees, non-participation 
has to be assumed, for weighting purposes, as being completely random within schools. This means 
that participating students represent both participating and non-participating students within a 
surveyed school. Accordingly their sampling weights had to be adjusted. 


The adjustment for student non-response for each participating student k was calculated as: 


Mh 

WGTADJ3S, = —~ 
Tl m. 

with m;. being the number of eligible students in school i in stratum h and m?. being the number 
of participating students in school j in stratum h. In the context of student weight adjustment, 
students of the target population were regarded as eligible if they had not been excluded due to 
disabilities or language problems.? 


Please note that sampled students who did not participate in the survey because they had left the 
sampled school after within-school sampling were counted as absent in the sampled school. These 
students were assumed to remain part of the target population (they moved to a different school 
but had a zero chance of selection since within-school sampling had already been completed at 
this point). Excluded students within participating schools carried their weight (i.e., reflecting their 
proportion in the target population) and this contributed to the overall estimates of exclusion. 


Final student weight (TOTWGTS) 


The final student weight of student k in school i in stratum h is the product of the four student- 
weight components: 


TOTWGTS,,,= WGTFAC1,, X WGTADJ1,, x WGTFAC3,,, X WGTADJ3,, 


2 Two countries deviated from this rule: in Luxembourg, all students were sampled, and in the Unites States, the sample 
size for the students was set to 30. 

3 For Chile this adjustment factor includes a gender adjustment factor as the original population estimates regarding the 
distribution of girls and boys did not match the recorded proportion of boys and girls in Chile (although the estimated 
total number of students did match the recorded population figures). This could be because the distribution of single- 
sex schools was not controlled for in the sample. The adapted adjustment factor fits the population estimates regarding 
the population size and the gender distribution. 


82 


ICILS 2018 TECHNICAL REPORT 


Calculating teacher weights 
School base weight (WGTFAC1) 


As the same schools were sampled for the student survey and the teacher survey, the school base 
weight of the teacher survey was identical to the school base weight of the student survey. 


School non-response adjustment (WGTADJ1T) 


A school non-response adjustment for the teacher study was calculated in the same way as the 
student non-response adjustment. Given that schools could be regarded as participating in the 
student survey but not inthe teacher survey, and vice versa, the school non-participation adjustment 
potentially differed between student and teacher data from the same school. To account for non- 
responding schools inthe sample, it was necessary to calculate a school weight adjustment for the 
teacher survey as follows for school i: 


Se 


WGTADJ1T,, = nee 


p-tch 
h 


Here, n** is again the number of sampled eligible schools and nP“" is the number of schools 
participating (whether originally sampled or replacement schools) inthe teacher survey in stratum h. 


Teacher base weight (WGTFAC2T) 


A systematic random sampling method, carried out via the IEA WinW3S, was used to randomly 
select teachers in each school. 


The teacher base weight for teacher | was calculated as: 


Ty 


WGTFAC2T = 7 
hi 


where T,, is the total number of eligible teachers in school iin stratum h and t;.is the number of 


sampled teachers in school / in stratum h. 
In schools with fewer than 21 target grade teachers, all eligible teachers were selected for 
participation. In these cases this weight factor was equal to one.* 


Teacher non-response adjustment (WGTADJ2T) 


Not all teachers were willing or able to participate in the study. Therefore, participating teachers 
represented both participants and non-participants. Again, the non-response adjustment carried 
out within a given school assumed, for weighting purposes, that there was a random process 
underlying teachers’ participation. 


The non-response adjustment was computed for each participating teacher | as: 
tse 
hi 
WGTADJ2T,, = BP 
where t?* is the number of eligible sampled teachers, and tP.is the number of participating teachers in 
school iinstratum h. Teachers, who left the school after they had been sampled but prior to the data 


collection, were regarded as out of scope and their weights were not adjusted in these instances. 


4 InLuxembourg, the number of teachers to select was increased to 20, thus all eligible teachers were selected in schools 
with less than 25 target grade teachers. 


SAMPLING WEIGHTS, NON-RESPONSE ADJUSTMENTS, AND PARTICIPATION RATES 


Teacher multiplicity factor (WGTFAC3T) 


Some teachers in ICILS 2018 were teaching at the target grade in more than one school (based 
on information from the teacher questionnaire) and therefore had a larger selection probability 
than those teaching at the target grade in only one school. In order to account for this, a “teacher 
multiplicity factor” was calculated as the inverse of the number of schools in which the teacher 
was teaching: 


WGTFACST,,= = 
fi 
Here, f,, is the number of schools where teacher | in school i in stratum h was teaching. 


hil 


Final teacher weight (TOTWGTT) 


The final teacher weight for teacher | in school i in stratum h is the product of the five teacher- 
weight components: 


TOTWGTT,, = WGTFAC1,, X WGTADJ1T,, X WGTFAC2T,,.X WGTADJ2T,, X WGTFAC3T,, 


Calculating school weights 


ICILS 2018 was designed as a survey of students and teachers but not as aschool survey. However, 
in order to collect background information at school level, a principal questionnaire and an ICT 
coordinator questionnaire were administered to every participating school. School weights were 
calculated and included in the international database in order to allow for analyses at the school 
level. However, results at the school level should be interpreted with some caution as they may 
be subject to considerable sampling error. 


School base weight (WGTFAC1) 


This weight component is identical to the school base weight of the student survey and the teacher 
survey (see above). 


School weight adjustment (WGTADJ1C) 


Schools in which no items were completed in either the principal questionnaire or the ICT 
coordinator questionnaire were regarded as non-participants in the school survey. In order to 
account for these non-responding schools, a school weight adjustment component was calculated 
for each participating school i as follows: 


n 
WGTADJ1C,, = — 


p-sch 
h 


Here, n, represents the number of eligible sampled schools and n?*" represents the number of 
schools with completed questionnaires in stratum h (whether originally sampled or replacement 
schools). 


ote that some schools may have been non-participants in the school survey but participated in the 
student and/or the teacher surveys. Consequently, some schools were regarded as participants in 
the student and/or teacher survey but as non-participants in the school survey. Some schools may 
also have completed (at least one of) the school-level questionnaires but were regarded as non- 
participants in the student and/or teacher surveys. It is very important to keep this in mind when 
undertaking analysis with data from different data sets. For this kind of multivariate analyses using 
different data sources, the proportion of missing values may accumulate with increasing numbers 
O 
O 


f variables. Those undertaking secondary analyses should thoroughly monitor the potential loss 
f information due to missing data across the different sampling units. 


84 


ICILS 2018 TECHNICAL REPORT 


Final school weight 


The final school weight of school jin stratum h is the product of the two weight components: 


TOTWGTC,,, = WGTFAC1,, X WGTADJ1C,, 


Calculating participation rates 


For ICILS 2018, weighted and unweighted participation rates were calculated at student and 
teacher levels to facilitate the evaluation of data quality and reduce the risk of potential bias due 
to non-response. In contrast to the weight-adjustments described earlier, participation rates 
were computed first considering the originally sampled schools only and then considering originally 
sampled and replacement schools. 


Unweighted participation rates in the student survey 


Let op denote the set of originally sampled eligible and participating schools, fp the full set of eligible 
participating schools, including replacement schools, and np the set of sampled eligible but non- 
participating schools in the student survey. Let eg ep and Nip denote the numbers of schools in 
each of the respective sets. The unweighted school participation rate in the student survey before 


replacement is calculated as: 


n 
UPRS < 


schools BR n.+#n 
fp np 


The unweighted school participation rate in the student survey after replacement is computed as: 


n 
UPRS p 


schools_AR n+n 
fp np 


Let sfp represent the set of eligible and participating students in all participating schools, that is, 
inthe schools that constitute fp as the complete set of eligible participating schools. Let snp be the 
set of eligible but non-participating students in schools that constitute fp, and let fe and N,,,D€ 
the number of students in these two respective groups. The unweighted student response rate 
is computed as: 


n 
UPRS = 


students n, +n. 
sfp sn 


Ip 


Note that it was not deemed necessary to compute student response rates separately for 
originally sampled and replacement schools because non-response patterns did not vary between 
(participating) originally sampled and replacement schools. 


The unweighted overall participation rate in the student survey before replacement is then: 


UPRS oy ag= UPRS enone eX UPRS. 


overall_BR schools_ students 


The unweighted overall participation rate in the student survey after replacement is: 


UPRS 


UPRS x UPRS 


overall AR schools AR students 


Weighted participation rates in the student survey 


The weighted school participation rate in the student survey before replacement was calculated 
as the ratio of summations of all participating students k in stratum h and school i: 


> p> teop best WWGTFAC1,, x WGTFACSS,,, x WGTADI3S, 
i€op —kesfp hi hik hik 
Yn Die Lkestp WGTFACT,,X WGTADIJ15,, X WGTFACSS,,,X WGTADJ3S, , 


WPRS 


schools_BR ~ 


SAMPLING WEIGHTS, NON-RESPONSE ADJUSTMENTS, AND PARTICIPATION RATES 85 


Here, the students in the numerator were computed as the sum over the originally sampled 
participating schools only, whereas the students in the denominator were calculated as the total 
over all participating schools. 

The weighted school participation rate in the student survey after replacement is therefore: 


>, DietoDkeapp WGTFAC1,,X WGTFACSS,,, X WGTADI3S,, 
Yh Diep Dtexjp WGTFACI,,X WGTADJ1S,, x WGTFACSS,, X WGTADJ3S 


hik 


WPRS 


schools AR 7 


The weighted student participation rate was computed as follows, taking again replacement 
schools into account: 


>, DieioDkeayp WGTFAC1,,X WGTFACSS,,, 
>, Dieip Dteoyp WGTFACI,,X WGTFACSS,,,X WGTADI3S,, 


WPRS 


students 


The weighted overall participation rate in the student survey before replacement is therefore: 


WPRS = WPRS x WPRS 


overall_BR schools BR students 


The weighted overall participation rate in the student survey after replacement is: 


WPRS = WPRS x WPRS 


overall_AR schools AR students 


Overview of participation rates in the student survey 


Table 7.1 and Table 7.2 display the unweighted and weighted participation rates of all countries 
in the student survey. Differences between the two tables indicate different response patterns 
among strata with disproportional sample allocations. For example, the unweighted school 
participation rate of Germany was considerably higher than the weighted rate because the federal 
state of North Rhine-Westphalia, in which almost all schools participated, was oversampled for 
Germany. North Rhine-Westphalia participated in ICILS both as a state within Germany and as a 
benchmarking participant. In comparison, relatively fewer schools participated in the remaining 
strata across Germany. 


It should be noted that only those schools that had at least a participation rate of 50 percent among 
their sampled students were treated as participants in the student survey. A school that did not 
meet this requirement was regarded as anon-participating school for the student data collection. 
The non-participation of this school had an effect on the school participation rate; however, the 
students from this school were exempted from the calculation of the student participation rate. 


ICILS 2018 TECHNICAL REPORT 


Table 7.1: Unweighted school and student participation rates—Student survey 


School participation rate (%) Student Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 91.1 99.4 93.6 85.2 eral 
Denmark 76.0 95.3 85.3 64.8 81.3 
Finland 97.9 98.6 91.8 89.9 90.6 
France 99.4 100.0 94.7 94.1 94.7 
Germany 84.3 913 88.5 74.6 80.8 
taly 95.3 100.0 95.0 90.6 95.0 

azakhstan 99.5 99.5 97.9 97.3 97.3 

orea, Republic of 100.0 100.0 96.7 96.7 96.7 
Luxembourg 92.7 92.7 90.1 83.5 83.5 
Portugal 86.3 91.3 81.1 70.0 74.1 
United States 67.5 76.9 90.8 614 69.9 
Uruguay 91.9 95.9 80.6 74.0 113 
Benchmarking participants 

oscow 98.7 100.0 95.8 94.5 95.8 
(Russian Federation) 

orth Rhine- 92.9 97.3 91.6 85.0 89.1 
Westphalia (Germany) 


Table 7.2: Weighted school and student participation rates—Student survey 


School participation rate (%) Student Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 91.0 100.0 93.1 84.8 93.1 
Denmark 75.6 95.3 84.8 64.1 80.8 
Finland 98.3 98.6 919 90.3 90.6 
France 99.4 100.0 95.0 94.4 95.0 
Germany 78.9 88.3 86.6 68.3 76.5 
taly 95.1 100.0 94.9 90.3 94.9 

azakhstan 99.5 99.5 97.6 97.2 97.2 

orea, Republic of 100.0 100.0 96.7 96.7 96.7 
Luxembourg 96.4 96.4 90.1 86.9 86.9 
Portugal 85.7 90.2 80.0 68.6 72.2 
United States 674 77.1 91.0 614 70.2 
Uruguay 90.7 95.7 80.2 728 768 
Benchmarking participants 

oscow 98.2 100.0 95.7 93.9 95.7 
(Russian Federation) 

orth Rhine- 92.6 97 A 91.0 84.2 88.6 
Westphalia (Germany) 


SAMPLING WEIGHTS, NON-RESPONSE ADJUSTMENTS, AND PARTICIPATION RATES 


Unweighted participation rates in the teacher survey 


The computation of participation rates in the teacher survey follows the same logic as applied in 
the student survey. 


Let op, fp, and np be defined as above, such that the participation status now refers to the teacher 
survey instead of the student survey, and let (gil and ae be defined correspondingly. The 
unweighted school participation rate in the teacher survey before replacement is computed as: 


n 
_ op 
UIPRT seacan _ ntn 
fp np 


The unweighted school participation rate in the teacher survey after replacement is calculated as: 


n 
UPRT. td 


schools AR n.+n 
fp 


np 


Let tfp be the set of eligible and participating teachers in schools that constitute fp, tnp be the set 
of eligible but nonparticipating teachers in schools that constitute fp, and let hes and Hiss be the 


number of teachers in the respective groups. The unweighted teacher response rate is defined as: 


n 
UPRT yeachers = $e 


teachers — 
Neo Dinp 


Note that it was not deemed necessary to compute teacher response rates separately for 
(participating) originally sampled and replacement schools because the non-response patterns 
did not vary between sample and replacement schools. 


The unweighted overall participation rate in the teacher survey before replacement is computed as: 


UPRT. 2 = UPRT. xX UPRT 


overall_B. schools BR teachers 


The unweighted overall participation rate in the teacher survey after replacement is calculated as: 


UPRT. = UPRT. x UPRT 


overall_AR schools AR teachers 


Weighted participation rates in the teacher survey 


The weighted school participation rate in the teacher survey before replacement is calculated as: 


>, Pico 2 tetip WGTFACA,, x WGTFAC2T,,, x WGTADJ2T,,, x WGTFACST j, 


WPRT. 


schools_BR — 


>, Dieio Dietip WGTFAC1,,x WGTADJ1T,,x WGTFAC2T, ,x WGTADJ2T,,x WGTFAC3I 


hil 


The weighted school participation rate in the teacher survey after replacement is calculated as: 


>, Dieip Dietip WGTFACA,, x WGTFAC2T,, x WGTADJ2T,,, x WGTFACST,, 


WPRT. = 


schools AR 


Yh Diep Diet WGTFACL,,x WGTADJ1T,, x WGTFAC2T, x WGTADJ2T, x WGTFACST, , 


The weighted teacher participation rate is therefore: 
Dp Diefp Xletfp WWGTFAC1,,x WGTFAC2T,,, x WGTFACST,, 


WPRT, = 


teachers 


>, Dieip Diep WGTFAC1, x WGTFAC2T,, x WGTADJ2T,,,x WGTFAC33T, 


hil hil 


The weighted overall participation rate in the teacher survey before replacement is calculated as: 


WPRT. = WPRT. x WPRT. 


overall_BR schools_BR teachers 


The weighted overall participation rate in the teacher survey after replacement is computed as: 


WPRT. = WPRT. x WPRT. 


overall_AR schools AR teachers 


88 


ICILS 2018 TECHNICAL REPORT 


Overview of participation rates in the teacher survey 


Table 7.3 and Table 7.4 display the unweighted and weighted participation rates of all countries in 


the teacher s 


urvey. Once more, discrepancies between the two tables indicate differential response 


patterns between strata with disproportional sample size allocations. As described earlier, Germany 


provides a prominent example of this effect. 


Note that on 


ly those schools where at least 50 percent of their sampled teachers had completed 


the survey were regarded as participants in the teacher survey. A school that did not meet this 


requirement 
of this schoo 


was regarded as anon-participating school in the teacher survey. The non-participation 
had an effect on the school participation rate (for the teacher survey), but not on the 


teacher parti 


cipation rates as the teachers from this school were not included in the calculation 


of the teacher participation rates. 


Table 7.3: Unweighted school and teacher participation rates—Teacher survey 


School participation rate (%) Teacher Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 89.9 97.2 93.5 84.1 90.9 
Denmark 72.7 92.0 84.4 61.4 777 
Finland 97-3. 97.9 92.2 89.7 90.3 
France 78.2 78.2 81.0 63.4 63.4 
Germany 73.8 795 85.3 62.9 678 
taly 94.0 98.7 92.8 87.3 91.6 

azakhstan 100.0 100.0 99.9 09.9) 99.9 

orea, Republic of 100.0 100.0 100.0 100.0 100.0 
Luxembourg 68.3 68.3 75.0 512 51.2 
Portugal 90.9 95.9 91.4 83.0 87.6 
United States 66.1 75.7 88.6 58.5 67.1 
Uruguay 66.9 70.3 75.3 50.4 53.0 
Benchmarking participants 

Oscow 98.7 100.0 100.0 98.7 100.0 
(Russian Federation) 

orth Rhine- 91.1 95.5 90.3 82.3 86.3 
Westphalia (Germany) 


SAMPLING WEIGHTS, NON-RESPONSE ADJUSTMENTS, AND PARTICIPATION RATES 


Table 7.4: Weighted school and teacher participation rates—Teacher survey 


School participation rate (%) Teacher Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 91.2 96.9 93.6 85.3 90.7 
Denmark 70.4 92.0 84.0 59.2 77.3 
Finland 97.8 98.0 92.5 90.4 90.7 
France 78.4 784 80.6 63.2 63.2 
Germany 63. 70.5 81.7 Bl 57.5 
taly 93.8 98.6 919 86.2 90.6 

azakhstan 100.0 100.0 100.0 100.0 100.0 

orea, Republic of 100.0 100.0 100.0 100.0 100.0 
Luxembourg 68.5 68.5 13:0 51.8 51.8 
Portugal 89.0 95.3 91.6 81.5 87.3 
United States 62.2 724 89.4 55.6 64.7 
Uruguay 69.5 74.1 74.5 51.8 55.2 
Benchmarking participants 

Oscow 97.6 100.0 100.0 97.6 100.0 
(Russian Federation) 

orth Rhine- 90.2 95.6 91.1 82.2 87.2 
Westphalia (Germany) 


ICILS 2018 standards for sampling participation 


Itis asignificant challenge within countries to achieve full participation (i.e., 100%) in a large-scale 
assessment and the nature of these challenges vary across countries. Given that one essential 
the national level, it is necessary to adjudicate for each country 


purpose of ICILS is to report data at 


whether the achieved sample is sufficient to warrant scient 


fically defensible reporting of national 


estimates. As is customary in IEA studies, |CILS 2018 established guidelines for reporting data 
rticipation. Adjudication of the data was done separately for 
each participating country and each of the two different ICILS 2018 survey populations. This was 
recommendations of the sampling referee (Marc Joncas) and 
he ICILS Joint Management Committee. 


for countries with less than full pa 


carried out in accordance with the 
in agreement with all members of t 


The first step of the adjudication process was to determine the minimum requirements for within- 


school participation. 


Within-school participation requ 


irements 


In general, decreasing response rates entail increasing the risk of biasing results. Because very little 
information about non-respondents was available, it was not possible to quantify the risk or bias 


of estimates due to non-participati 


on inmost countries. To overcome this, and in addition to the 


overall participation rate requirements described below, ICILS 2018 established strict standards 
for minimum within-school participation: data from schools with a response rate of less than half 
(50%) of sampled students or teachers, respectively, were discarded. This constraint meant that 
not every student or teacher who completed a survey instrument was automatically considered 


as participating and thereby contri 


buting to the computation of population estimates. 


The within-school response rate was computed separately for the student survey and the teacher 


survey. Therefore, aschool may count as participating in the student survey but not inthe teacher 


survey or vice versa. 


89 


90 


ICILS 2018 


Student survey participation requirements 

Students were regarded as respondents if they replied to at least one task 
test. Please note, however, that the overall amount of partial non-response (i 
questionnaires or tasks that had not been attempted) was minimal. 


may increase as within-school response rates decrease. 


TECHNICAL REPORT 


in the achievement 
.€., omitted items in 


There is evidence that attendance and academic performance tend to be positively correlated 
Balfanz and Byrnes 2012; Hancock et al. 2013). Consequently, the likelihood of biased results 


Whenever there was evidence that the survey operation procedures in a school had not been 
conducted following the established ICILS 2018 standards, the corresponding school was regarded 


as anon-participant. For example, if a school failed to list all eligible students 


for the selection of 


a student sample and thus causing a risk of bias due to insufficient coverage, the corresponding 


school’s student data were not included in the final database. 


Teacher survey participation requirements 

Teachers were regarded as respondents if they replied to at least one j 
questionnaire. But again, as was the situation with respect to the students, th 
partial non-response (i.e., omitted items in the questionnaires) was low. 


order to help reduce non-response bias, a school was only regarded as a “p 


t is possible that specific groups of teachers tend to be less likely to partici 


tem in the teacher 
e overall amount of 


pate in a survey. In 
articipating school” 


in the teacher survey if at least 50 percent of its sampled teachers participa 


non-participating. 


f a school failed to follow the survey operation procedures properly, it was 


selection, or if the standard teacher selection procedures had not been followe 
from this particular school were not included in the final database. 


Country-level participation requirements 
Three categories for sampling participation were defined: 


e Countries grouped in Category 1 met the ICILS 2018 sampling participati 


ted. If the response 


rate was lower, teacher data from this school were disregarded and the school was treated as 


classified as a non- 


participating school. For example, if a school failed to list all eligible teachers for the teacher sample 


d, then teacher data 


on requirements. 


e Countries in Category 2 met these requirements only after the inclusion of replacement schools. 


e Countries in Category 3 failed to meet the ICILS 2018 sampling participation requirements. 


Sampling participation categories for the teacher survey were identical to the ones in the student 
survey. The results from ICILS 2018 show that high response rates in the teacher survey were often 


harder to achieve than in the student survey. However, there is no statistical j 


ustification to apply 


different sampling participation standards to the two surveys. Since non-response holds a high 
potential for bias in both parts of the study, the participation requirements in the teacher survey 
were identical to those in the student survey. No participation requirements were determined for 
the reporting of school-level data, however, the participation rate in the school survey was above 
85 percent for all countries that were placed in Category 1 and Category 2 for the student survey. 


The three categories for sampling participation were defined according to the criteria presented 


in Figure 7.1. 


SAMPLING WEIGHTS, NON-RESPONSE ADJUSTMENTS, AND PARTICIPATION RATES 


Figure 7.1: Participation categories in ICILS 


Category 1: Satisfactory sampling participation rate without the use of replacement schools. 

In order to be placed in this category, a country has to have: 

e Anunweighted school response rate without replacement of at least 85 percent (after 
rounding to the nearest whole percent) and an unweighted overall student/teacher response 
rate (after rounding) of at least 85 percent 

or 

e¢ Aweighted school response rate without replacement of at least 85 percent (after rounding 
to the nearest whole percent) and a weighted overall student/teacher response rate (after 
rounding) of at least 85 percent 

or 


e The product of the (unrounded) weighted school response rate without replacement and the 
(unrounded) weighted overall student/teacher response rate of at least 75 percent (after 
rounding to the nearest whole percent). 


Category 2: Satisfactory sampling participation rate only when replacement schools were 
included. 

A country will be placed in this category if: 

fails to meet the requirements for Category 1 but has either an unweighted or weighted 
school response rate without replacement of at least 50 percent (after rounding to the 
nearest percent) 


+ 


and had either 

e Anunweighted school response rate with replacement of at least 85 percent (after rounding 
to the nearest whole percent) AND an unweighted overall student/teacher response rate 
after rounding) of at least 85 percent 


or 
e Aweighted school response rate with replacement of at least 85 percent (after rounding 
to nearest whole percent) AND a weighted overall student/teacher response rate (after 
rounding) of at least 85 percent 


or 


e The product of the (unrounded) weighted school response rate with replacement and the 
(unrounded) weighted overall student/teacher response rate of at least 75 percent (after 
rounding to the nearest whole percent). 


Category 3: Unacceptable sampling response rate even when replacement schools are included. 


Countries that can provide documentation to show that they complied with ICILS sampling 
procedures, but do not meet the requirements for Category 1 or Category 2 will be placed in 
Category 3. 


91 


92 


Reporting data 


The ICILS 2018 research team considered it necessary to make readers of the international 
report aware of the increased potential for bias, regardless of whether such a bias was actually 
introduced. Based on their respective sample participation categories, national survey results 


were reported on as follows: 


e Category 1: Countries in this category appear in the tables and figures in the international 


reports without annotation. 


e Category 2: Countries inthis category are annotated in the tables and figures in the international 


reports. 


IC 


ILS 2018 TECHNICAL REPORT 


e Category 3: Countries in this category appear in a separate section of the tables. 


For the student survey, nine countries and both benchmarking par 
meeting sampling participation requirements in Cat 
while six countries and the two benchmarking parti 
survey. Two countries were in Category 2 for the student survey? a 
category for the teacher survey. One country had its student survey re 
section of the tables as a Category 3 country, while for the teacher s 
five countries. All |CILS 2018 countries and benchmarking participan 


cipants were in th 


ticipants were reported as 


egory 1 and were reported without annotation, 


is category for the teacher 
nd one country was in this 
sults reported in aseparate 
urvey this was the case for 


their originally sampled schools above 50 percent. 


Table 7.5: Achieved participation categories by country 


ts had participation rates of 


lable 7.5 lists the participation categories of each country for the student and the teacher surveys. 


Country 


Participation category 


Student survey 


Teacher survey 


Chile 


= 


Denmark 


Finland 


France 


Germany 


taly 


azakhstan 


orea, Republic of 


Luxembourg 


Portugal 


United States 


Uruguay 


N]lolmsefrereJleslejleye]dnrm 


DPW] RP LT WTLRI_RPIT_ RI WI WIRPINIF 


Benchmarking participants 


oscow (Russian Federation) 


orth Rhine-Westphalia (Germany) 


5 Please note that Portugal is reported for the student survey in category two, because they only slightly missed the 


required minimum parcipation rate. 


SAMPLING WEIGHTS, NON-RESPONSE ADJUSTMENTS, AND PARTICIPATION RATES 


References 


Balfanz, R., & Byrnes, V. (2012). Chronic absenteeism: Summarizing what we k: 


Groves, R.M., Fowler, F.J., Couper, M.P., Lepkowski, J.M., Singer, E., & 
methodology. New York, NY: Wiley. 


Workplace Relations. 


nsights from the field. http://surveyinsights.org/?p=5353 


J. Fraillon, W. Schulz, T. Friedman, J. Ainley, & E. Gebhardt, International C 
Study 2013 technical report (pp. 87-112). Amsterdam, The Netherlands: | 
Evaluation of Educational Achievement (IEA). 


Statistics Canada. (2003). Survey methods and practices. Ottawa, Canada: Author. https://www150.statcan. 


gc.ca/n1/pub/12-587-x/12-587-x2003001-eng.pdf 


Rust, K. (2014). Sampling, weighting, and variance estimation in international large-scale assessments. In 


einck, S., & Cortes, D. (2015). Sampling weights, nonresponse adjustments, and participation rates. 


ow from nationally available data. 


Baltimore, MD: Johns Hopkins University Center for Social Organization of Schools. 


Tourangeau, R. (2004). Survey 


Hancock, K. J., Shepherd, C. C. J., Lawrence, D., & Zubrick, S. R. (2013). Student attendance and educational 
outcomes: Every day counts. Canberra, Australia: Report for the Department of Education, Employment and 


Lohr, S. L. (1999). Sampling: Design and analysis. Pacific Grove, CA: Duxbury Press. 


einck, S. (2015). Computing sampling weights in large-scale assessments in education. Survey methods: 


=> 


omputer and Literacy Information 
nternational Association for the 


Rutkowski, L., von Davier, M., & Rutkowski, D., (Eds), Handbook of international large-scale assessment. New 


York: CRC Press. 


CHAPTER 8: 
ICILS 2018 field operations 


Ekaterina Mikheeva and Sebastian Meyer 


Introdu 


Successful 


coordinati 
participati 
CILS 201 


The ICILS 


to be flexi 


Reading Li 


and Intern 


the specifi 
of the stud 


Allnationa 
assessmen 


materials f 


data files. 


the study’s nation 


internationally sta 
of their instrumen 


ction 


admini 


on and 
ng coun 
8 stude 


internati 


ble enou 


and t 


te 
at 
cr 
en 


racy S 


equiremen 


ia 
or data collec 


onal stud 
ndardized 


gh to sim 
quality expectations of IEA 
study procedures 
tudy ( 
ional Civic and 


t instruments. 


| centers recei 
he guidelin 


ministration of the assessment along wit 
udy presented a set of significant challen 
tened by the deman 


ents on computer. 


y center (ISC) at ACER i h IEA therefor: 


field operations proced 


n cooperation wit 


eeds of individua 
team began by referring to the 
EA studies, such as | 
LS), Trends in International Mathemati 
Citizenship 
f ICILS 2018, most importantly the computer-based ad 


ultaneously meet the n 
survey standards. The 

hose used in other | 
PIR 


[SO 


ved guidelines on the survey operations procedures for each 


stration of ICILS 2018 assessment depended heavily on the contributions of 
al research coordinators (NRCs) and national center staff. As is the situation 
for all large-scale cross-national surveys, ad 
ogistical aspects of the st 
try. These challenges were heigh 
tinstrum 


h the overall 
ges for each 


ds of administering the 


e developed 


ures to assist the NRCs and to aid uniformity 
t-administration activities. The international team designed these procedures 
participants and the high 


ICILS 2013 


EA's Progress in International 
cs and Science Stu 
Education Study (ICCS), and then tailoring these to suit 


dy (TIMSS), 


ministration 


stage of the 


es advised on contacting schools, listing and sampling studen 


ts, preparing 
tion, administering the assessment, scoring the assessment, and creating 
ational centers also received materials on procedures for quality control and were 


asked to complete online questionnaires that asked for feedback on the survey activities. 


Field operations personnel 


The role of the national research coordinators and their centers 


One of the fi 


take when 


main contac 
representati 


RCs were 
where necessary, implemented and ad 


national con under the guidance of 


rst steps th 
estab 
t person for 
ve a 


in charge of 


text 


ishing the study in thei 


t the international level. 


at all countries or education systems participating in 
r country was to appoint an NRC. The 


all those involved in ICILS 2018 within the country and was 


the overall implementation of the study at the nation 


apted internationally agreed-upon p 


The role of school coordinators and test administrators 


al level. They a 
rocedures for 
the international project staff and national experts. 


CILS 2018 had to 
NRC acted as 


the 


the country 


the 


norder to facilitate successful administration of ICILS 2018, the international team required the 


establishment of two roles within countries: the school coordinator and the test administrator. 
Their work involved preparing for the test administration in schools and carrying out the data 
collection in a standardized way. 


96 


ICILS 2018 TECHNICAL REPORT 


In cooperation with school principals, national centers identified and trained school coordinators 


for all participating schools. The school coordinator could be a teach 


er or other staff member in 


the school. The school coordinator could also be the test administrator at the school, but was not 


to be ateacher of any of the sampled students. In some cases, nationa 
individuals as school coordinators. The coordinators’ responsibilities | 


tasks: 


Identifying eligible students and teachers belonging to the target po 
center to perform withi 


n-school sampling; 


lcen 
nelu 


ters appointed external 
ded the following major 


pulation to allow the national 


Arranging the date(s) and modalities of the test administration, in particular the delivery method 


of the student 
Distributing in 
they were kep 
Working with 


administer the student 


Ensuring that 


The test admini 
questionnaire. T 


Accordingly, a training session was run by the nati 


test, with 


struments and related materials 
tin asecure place a 


the national center; 


the school principal, 


testing; an 


d 


the test administrators return al 


strators were mal 


hey were employed either by t 


nd confidenti 
the test admi 


nly responsible for ad 


needed for tes 
al at all times; 


nistrator, and 


testing ma 


he national ce 
onal center ce 


sure that the test administrators were adequately prepared to r 


Field operations resources 


Manuals and documentation 


teri 


t administration and making sure 


the affected teachers to plan and 


als after the testing session. 


ministering the student test and 


nter or directly by the schools. 
ntrally or by the schools to make 
un the assessment sessions. 


The international study team released the ICILS 2018 survey operations procedures manuals to 
the NRCs in five units, each of which was accompanied by additional materials, including manuals 


for use in sch 
chronological 


The fi 


ve units 
Unit 1: Sam 
sampling pl 
Unit 2: Wor! 


Un 
Un 
Un 


gui 
- 


an in compliance wi 
ing with Schools con 
plan for successful administrati 
it 3: Instrument Preparation d 
instruments for production and 


it 4: Data Collection and Qua 
preparing for, supporting, and monito 


th the i 
tained 
on of t 


use In 


student responses, and coding parental occupations. 


control progra 


ms. 


[he Schoo! Coordinator Manual, subject to translation, described the role a 
the school coordinator, the main contact person within each participating school. 


The National Quality Observer Manual provided national quality con 
formation about ICILS 2018, their role and responsibilities during 
timelines, actions, and procedures to be followed in order to carry out the national quality 


ools and software packages. All of this material was organized and distributed 
y according to the stages of the study. 


and their accompanying manuals and software packages were: 


pling Schools specified the actions and procedures required to develop a national 
nternational ICILS 2018 sample design. 

information about how to work with schools in order to 
he ICILS 2018 instruments. 


escribed the processes involved in preparing the ICILS 2018 
countries. 


ity Monitoring Procedures dealt with the processes involved in 
ring ICILS 2018 data collection in schools. 


it 5: Post Collection Data Capture, Data Upload, Scoring and Parental Occupation Coding provided 
delines on post-data collection processes and tasks. These included, but were not limited 
data capture from the paper questionnaires, uploading student assessment data, scoring 


nd responsibilities of 


The Test Administrator Manual, subject to translation, described the role and responsibilities of 
the test administrator, whose work included administration of the student assessment. 


trol observers with 
the project, and the 


ICILS 2018 FIELD OPERATIONS 


Th 
in 
tin 
observer programs. 
h 


Th 


e International Quality Observer Manual provided international quality observers with 
formation about ICILS 2018, their role and responsibilities during the project, and the 
nelines, actions, and procedures to be followed in order to carry out the international quality 


e Scoring Guides for Constructed-Response Items, subject to translation, provided detailed and 
plicit guidelines on how to score each constructed-response item. 


e Compatibility Check and School Computer Resources Survey: Instructions for NRCs and Preparing 
Computers for ICILS - Instructions for School Coordinators Manual addressed whether computers 
in the sampled schools could be used for the ICILS 2018 assessment and whether special 


arrangements needed to be made in order to administer the assessment. 


Software 

The intern 
collection and scoring of cons 
IEA Tran 
and layout verification of 
software also allowed for cu 
that did not require translati 


AM Designer (RM Results): T 


Ons 


speaking countries that did n 


AM Examiner: This software was 


contextual questionnaire to the students. The software wa 
or ona set of laptop compu 


existing computers in the schoo 
ICILS 2018 administration. Alter 


it was not possible to run the test software on USB sticks connected to indivi 


AM Marker: This web-based app 


slation System: This web-based application supported transla 
the teacher, principal, and ICT coordinator questio 
tural adaptations and verification for 


his web-based application supported translation 
verification, and layout verification of the student instruments (test modules, 
questionnaire). This software also allowed for cultural adaptations and verificatio 
ot require translations. 


used to administer the com 


natively, alaptop server admi 


ational project team also supplied NRCs with software packages to assist with data 
tructed-response items: 


nverification, 
nnaires. This 
ing countries 


tion, translatio 


English-speak 


, translation 
tutorial, and 
n for English- 


puter-based ICILS 2018 test and 
s run from USB sticks either on 
ters provided specifically for the 
nistration method was used when 
dual computers. 


software allowed NRCs to crea 
software also included a train 
responses, score student res 


IEA Coding Expert: This 


leaders, and coding staff to m 


allowed NRCs to create team 
a training tool that ena 


coding. 


Sd 


national centers to selec 
sample design specificat 
software to track school, 
and assign test instrumen 


IEA Online Survey System 
questionnaires to be tra 
with these online versions th 


client-based appl 
anage and c 
responses with respect to paren 


bled codin 


IEA Windows Within-School Samp 
t studen 
ions and 


te scoring teams 


flag responses for r 


arry out 
pation 


ts’ OCCU 
nd to ass 


ts and teachers in each sa 


ication enabled scoring admi 
and scorers to manage and carry out the scoring process for constructed-res 
and to assign scorers to scoring teams. 
ing tool that enabled navigation between sections, score trai 
ponses, and 


ication enabled coding administrators, coding 
the coding process for open-ended stu 
in the student questionnaire. The software 
ign staff to coding teams. The software also incl 
g of student responses and flagging responses for review 


nistrators, scoring team leaders, 
ponse items. The 
The 


ning 


eview scoring. 


team 
dent 


uded 


ing Software (IEA WinW3S): This enabled the ICILS 2018 


mpled school in agreement with 


mandatory sampling algorithms. National centers used the 
teacher, and student information, prepare the survey tracking forms, 
ts to students. 


IEA OSS): This software enabled verified text passages in the 


nsferred from the IEA Translation System to online questionnaires, 
en delivered to respondents. 


98 


e [EA Data Management Expert (IEA D 


ICILS 2018 TECHNICAL REPORT 


E): This software facilitated the entering of paper 


questionnaire data. The IEA DME software also allowed national adaptations to be made to 
the questionnaires and provided a set o 


In addition to preparing the software and 
designed to train national center staff on all procedures and the software supporting these 


procedures. Namely, IEA WinW3s, | 


f data quality control checks. 


EA an 


AM Examiner. This seminar was com 


d RM Results translation systems, | 


manuals, IEA conducted data-manage 


ment training 


EA 


DM 


E, and the 


bined with a scoring training, during which natio 


staff were trained to use the AM Marker. Instructions for using the ICILS translation sys 
covered in one of the regular NRC meetings. 


Field operations processes 


Linking students and teachers to schools 


Every sampled student was assigned an eight-digit | 
number consisted of the four-digit number identifyin 
identifying the student group within the school (01 for all), a 


student within that group. 


Each sampled target-grade teacher was assigned a teacher! 


school number followed by a two-digit teacher number unique within the school. 


Table 8.1: Hierarchical identification code 


5 


nal center 
tems were 


The international project staff established a system to assign hierarchical identification codes (IDs). 
These uniquely identified and allowed tracking of the sampled schools, teachers, and students. 
Table 8.1 represents the hierarchical identification system codes. 


D number unique within each country. Each 
g the school, followed by a two-digit number 
nd a two-digit number identifying the 


D number consisting of the four-digit 


Unit ID components ID structure Numeric example 
School (principal and School (C) CCCC 1001 
ICT coordinator) 
Student School (C), Student Group CCCCGGSS 10010101 
(G, constant: 01), Student (S) 
Teacher School (C), Teacher (T) CCCCTT 100101 


Activities for working with schools 


In ICILS 2018, the within-school sampling process and th 


e assessment administration required 


close cooperation between the national centers and representatives from the schools, that is, the 
school coordinators and test administrators as described previously. Figure 8.1 presents the major 
activities the national centers conducted when working with schools to list and sample students 
and teachers, track respondents, prepare for test administration, and carry out the assessment. 


Contacting schools and within-school sampling procedures 


Once NRCs had obtained a list of the schools sampled for ICILS 2018 (for more information on 


sampling procedures, please refer to Chapter 6 of th 


is report), it was important for the success of 


the study that national centers established good working relationships with the selected schools. 


NRCs were responsible for contacting the schools a 


nd encouraging them to take part in the 


assessment, a process that often involved obtaining support from national or regional educational 
authorities or other stakeholders, depending on the national context. 


ICILS 2018 FIELD OPERATIONS 99 


In cooperation with school principals, national centers identified and trained school coordinators 
for all participating schools. The school coordinator could be a teacher or guidance counselor in 
the school. Incases where the school coordinator also acted as the test administrator at the school, 
he or she was not allowed to be a teacher of the sampled class. In some cases, national centers 
appointed one of their own members to fill this role. Often this person was responsible for several 
schools in an area. Each school coordinator was provided with an IC/LS Schoo! Coordinator Manual, 
which described their responsibilities in detail and encouraged them to contact the national center 


if they had any questions. 


School coordinators were required to provide all required information about their respective 
schools and additionally coordinate the date, time, and place of the student assessment. Schoo 
coordinators were also responsible for arranging modalities of the test administration with 
the national center, for example, regarding the use of school or externally provided computers. 
This work required them to complete the school computer resources survey, run the USB-based 
compatibility check, and send results to the national study center. School coordinators were also 
responsible for obtaining parental permission as necessary, liaising with the test administrator to 
coordinate the test session, distributing teacher, school, and ICT coordinator questionnaires, and 
coordinating completion of the student tracking forms and teacher tracking forms. School coordinators 
also ensured that assessment materials were received, kept secure at all times, and returned to 
the national center after the administration. 


National centers sent a student listing form to each school coordinator and asked them to provide 
information on all the eligible target-grade students in the school. School coordinators collected 
details about these students, such as their names (if country regulations allowed names to 
be provided to the national centers), birth month and year, gender, exclusion status,! and the 
assessment language of the student (in case the national center provided different language 
versions of the student instruments). 


The national centers used this information to sample students within the schools. Listing all eligible 
students in the target grade was key to ensuring that every student in the target population had 
a known chance of being sampled, an essential requirement for obtaining random samples from 
all of the target-grade students at and across schools. 


ational centers also sent a teacher listing form to each school coordinator and asked them to provide 
information on all the eligible target-grade teachers within the school. The school coordinators 
listed the eligible target-grade teachers and provided details about these teachers, such as their 
names (if country regulations allowed names to be provided to the national center), birth month 
and year, and gender. The national centers used the collected information to sample teachers 
within the schools. 


1 Although all students enrolled in the target grade were part of the target population, ICILS 2018 recognized that some 
student exclusions were necessary because of a physical or intellectual disability, or in cases of non-native language 
speakers without the language proficiency to complete the assessment. Accordingly, the sampling guidelines allowed 
for the exclusion of students with any of several disabilities (for more information on sampling procedures, please see 
Chapter 6). Countries were required to track and account for all students, yet flagged those for which exemptions were 
defined. Because the local definition of such disabilities could vary from country to country, it was important that the 
conditions under which countries excluded students were carefully documented. 


100 ICILS 2018 TECHNICAL REPORT 


Figure 8.1: Activities with schools 


National center Schools 


Track school information 

e Update school information, merge/obtain contact 
information 

e Initialize IEA WinW3S: provide complete database 
information, import school sample database provided 
by IEA, translate and/or adapt survey tracking forms 
e.g., student listing form) 

e Record sampled school’s participation status, use 
replacement if necessary 

¢ Create student listing forms and teacher listing 


orms (printed or electronic) and send to school 
coordinators for completion 


Within-school listing 

e School coordinator lists all in-scope students on 
the student listing form 

e School coordinator lists all in-scope teachers on 
the teacher listing form 


Confirm assessment administration resources and 

method 

e Set up system to record and follow up results of 
school resources surveys and software compatibility 


checks 
e School coordinator sends the completed forms 
back to the national center 
Confirm assessment administration resources 
Sample students and teachers and method 
e Manually enter counts from student listing and/ * School coordinator arranges (or completes) 
or teacher listing forms (number of students and USB based school resources survey and 
teachers), create student and/or teacher records and software compatibility check 


enter information 


OR 
e Import student listing and/or teacher listing forms _s—‘izY 


directly 

e Sample teachers 

¢ Generate teacher tracking forms 

e Sample students (includes assigning instrument 
rotation) 

e Generate student tracking forms (paper and/or 
electronic) 

e Print instrument labels for teacher, principal, and ICT 


coordinator questionnaires 
e Send tracking forms and labeled survey instruments 
to schools 


Confirm assessment administration resources and Assessment administration 


method e Test administrators track student participation on 
¢ Confirm assessment administration process for each student tracking forms 
participating school based on information from the * School coordinators track teacher participation 
school on teacher tracking forms 


e School coordinators/test administrators send the 
completed forms back to the national center 


Track student and teacher participation status 

e Import/enter student participation information from 
student tracking forms 

e Import/enter teacher participation information from 
teacher tracking forms 

e Import student participation data availability status 
from test administration system 

e Import online questionnaire data availability status from 
the IEA OSS Monitor 


ICILS 2018 FIELD OPERATIONS 101 


Preparing the computer-based test delivery at schools 


Because ICILS 2018 was a computer-based assessment, it was necessary to test the computer 
resources available at participating schools to ascertain whether the school computer resources 
could be used to deliver the assessment. 


The compatibility check and school computer resources survey were administered in order to 
answer two questions: (i) if school computers could be used for the testing or if schools would 
need to be provided with computers able to do this task; and (ii) if, in those cases where the school 
computers could be used for testing, special arrangements would be needed (e.g., altering the 
configuration of computers or using a laptop with the local server connected to the school LAN) 
for the USB-based student test to run correctly. 


The process of administering the compatibility check and school resources survey required NRCs 
in non-English speaking countries to translate the school computer resources survey questions 
and to make them available along with the USB compatibility check file to school coordinators on 
a USB stick. 


After receiving the USB sticks containing the compatibility check files and instructions, school 
coordinators were required to: 


e Runthe USB compatibility check on every computer that was to be used for the ICILS 2018 
assessment; 


e Complete one of the included school computer resources surveys per school; and 


e Send the results back to the national study center. 


This information on the availability and compatibility of the participating schools’ computers 
enabled national centers to determine the best test delivery method for each school. 


[he national centers then sent the following items to each school: the necessary tracking forms, 
labels, questionnaires (online or paper-based), and manuals as well as USB sticks matching the 
number of students listed on the student tracking form (plus three extra sticks). 


Administering the assessment at schools 


he process of distributing the printed materials and the electronic student instruments to the 
schools required the national centers to engage in careful organization and planning. 


The national centers sent teacher questionnaires to each teacher listed on the teacher tracking 
form, in each school. They also sent a principal questionnaire to each school’s principal and an ICT 
coordinator questionnaire to each school’s ICT coordinator. 


The national centers furthermore prepared and sent cover letters containing login information 
and instructions on how to complete the online questionnaire to all teachers, school principals, 
and ICT coordinators who had elected to complete their questionnaires online. National center 
staff sent the packaged materials to the school coordinators prior to the testing date and asked 
them to confirm the receipt of all instruments. School coordinators then distributed the school 
questionnaire and teacher questionnaires (or the cover letters for the online participants) while 
ensuring that the other instruments were kept in a secure room until the assessment date. 


In accordance with the international guidelines and requirements as well as local conditions, 
national centers assigned atest administrator to each school. 1n some cases, the school coordinator 
also acted as the test administrator. The test administrators received training from the national 
centers. Their responsibilities included: running a pretest administration on the day of testing in 
order to confirm that the student computers were prepared for the test; distributing materials 
to the appropriate students; logging in and initializing the test on the computers (either via the 
USB sticks provided by the national centers or the server method): leading students through the 
assessment; and, accurately timing the sessions. 


102 ICILS 2018 TECHNICAL REPORT 


The student tracking forms indicated, for each sampled student, the assigned student instrument, 
which consisted of the two test-item modules and the student questionnaire, administered via 
the ICILS 2018 Student Test Software. For ICILS 2018 countries administering computer and 
information literacy (CIL) test modules only (i.e., Chile, Italy, Kazakhstan, Uruguay, and Moscow, 
Russian Federation), administration of the assessment consisted of three parts, the first two 
of which required students to complete CIL test modules and the third to answer the student 
questionnaire. In countries participating in the computational thinking (CT) test modules (i.e., 
Denmark, Finland, France, Germany, Korea, Luxembourg, Portugal, the United States, and North 
Rhine-Westphalia, Germany), the test session consisted of five parts with both CT test modules 
following the two CIL test modules and student questionnaire parts. Test administrators were 
requested to document student participation on the student tracking forms. 


During the administration of the assessment test, administrators were required to provide arange 
of instructions to students. When administering some parts of the assessment test, administrators 
were asked to read instructions to the students as provided to them in the Test Administrator 
Manual. Administrators had to read the text to the students exactly as it appeared in the script. In 
some other parts of the assessment test, administrators were required to read instructions from 
a script but had the option of modifying or adapt 


ng it to best suit a given situation. 


In these instances, it was essential that the exact contents and meaning of each of the scripts was 
conveyed to each set of students. The only instances in which test administrators could use their 
own words was when the test administrator manual did not include a script for the instructions, for 
example, when the manual explicitly advised administrators that they could answer any questions 
or points of clarification. 


The time allotted for each part of the student testing and questionnaire administration was 
standardized across countries. In all countries, target-grade students were allowed 30 minutes to 
complete each of the two modules (60 minutes in total). Students who completed the assessment 
before the allotted time was over, were allowed to review answers or read quietly, but were not 
allowed to leave the session. Students were given around 25 minutes to complete the student 
questionnaire and were allowed to continue if they needed additional time. Test administrators were 
required to document the starting and ending time of each part of the assessment administration 
onthe test administration form. 


Incountries administering the two CT modules, after the regular CIL test and student questionnaire 
sessions (as described above), the test session was extended by an additional 25 minutes per each 
of the two CT test modules. Table 8.2 details the time allotted to the different parts of the student 
assessment. 


Once the administration was completed, the school coordinators were responsible for collecting 
and returning all materials to their respective national center. 


Online data collection of school principal, ICT coordinator, and teacher questionnaires 


As in the previous cycle, ICILS 2018 offered participating countries the option of administering 
the principal, ICT coordinator, and teacher questionnaires online instead of in paper form. To 
ensure comparability of the data from the online and the paper modes, only those countries that 
had previously tested the online data collection during the ICILS 2018 field trial were allowed to 
use the online option during the main survey. All countries used the online administration mode 
for their schools. 


After the principal, ICT coordinator, and teacher questionnaires had gone through the translation 
and translation verification processes, they were prepared for delivery online using the IEA Online 
Survey System (IEA OSS) software as described in more detail in Chapter 5 of this report. 


ICILS 2018 FIELD OPERATIONS 


Table 8.2: Timing of the ICILS assessment 


Activities 


Length 


Preparation of students, reading of instructions, and administering 


the tutoria 


20 minutes (approx.) 


Administering the CIL student assessment—first module 


30 minutes (exact) 


Short brea 


5 minutes (max.) 


Administering the CIL student assessment—second module 


30 minutes (exact) 


Short brea 


5 minutes (max.) 


Administering the stude 


nt questionnaire 


20 minutes (approx.) 


Longer break (CT only) 


Between 15 and 45 minutes 


Administering the CT student assessment—first module (CT only) 


25 minutes (exact) 


Administering the CT student assessment—second module (CT only) 


25 minutes (exact) 


Collecting the assessme 


nt materials and ending the session 


5 minutes (approx.) 


TOTAL (CIL only) 


2 hours (approx.) 


TOTAL (including CT) 


3.5 hours (approx.) 


The IEA OSS is a hierarchica 
information, including text pa 


and information for data management. 


[he Designer compone 
questions and categori 


web server to verify an 
supported the export o 
isomorphic data entry 


S 
to audit participation i 


questionnaires were incomplete or not returned in asimilar way 
of the paper questionnaires. The live systems were hosted on dedicated high-pe 


d preview the survey exactly as if under | 


f codebooks to IEA's generic data entry software, the | 
of online and paper questionnaires. The Web component was a compiled 
application that provided questionnaires in HTML format to the respondents for completion within 

tandard internet browsers. Finally, the web-based Monitor com 


n real-time. It also allowed the centers 


rented from a reliable and experienced solution provider in Ger 


The electronic versions of the ICILS 2018 principal, ICT coordin 
could only be completed via the internet. Accordingly, the design 


needed only an internet connecti 
particular operating system 


options, such as sendin 


To limit the administra 
made the initial decisio 


in similar surveys and d 
was assigned the same 


mailing them to the nati 


respondents. This decisi 


g PDF documents via email or printing o 
onal center. 


tive burden and necessary communication 


was required. Respondents were no 


ive conditions. 


cluded a com 


model of asurvey that stores and manages all questionnaire-related 
ssages, translations and adaptations, verification rules, variable names, 


To serve the different possible usage scenarios, the IEA OSS comprises three distinct components. 
nt was used to create, delete, disable, and edit survey componen 
es) and their properties. It enabled translation of all text passages in the 
existing national paper questionnaires and additional system texts, and it in 


to that used in 


many. 


ponent allowed 
to follow up wi 


ator, and teach 
ensured that o 
on and a standard internet browser. No addi 


ts (e.g, 


plete 
The Designer also 
EA DME, to enable 


national centers 
th schools when 
the administration 
rformance servers 


er questionnaires 
nline respondents 
tional software or 


t allowed to use other delivery 


with schools, nationa 
n on whether to assign the online or paper questionnaire as a default to 
on was based on the centers’ and schools’ prior experience of parti 
uring the ICILS field trial. Usually, every respondent in a particular school 
mode, either online or paper. However, national centers were re 
to take into account the mode that a specific school or a particular individual preferred. 


ut the online questionnaires and 


centers 


cipation 


quested 
ational 


103 


104 ICILS 2018 TECHNICAL REPORT 


centers had to ensure that every respondent assigned to the online mode by default had the option 
to request and complete a paper questionnaire, regardless of the reasons for being unwilling or 
unable to answer online. 


To ensure confidentiality and separation, every respondent received individual login information. 
The national centers sent this information, along with general information on how to access the 
online questionnaire, to respondents in the form of “cover letters.” In line with the procedures used 
during distribution of the paper questionnaires, the school coordinator delivered this information 
to the designated individuals. 


During the administration period, respondents could log in and out as many times as they needed 
and could resume answering the questionnaire at the question they had last responded to in 
their previous session. Answers were automatically saved whenever respondents moved to 
nother question, and respondents could change any answer at any time before completing the 
questionnaire. During the administration, the national center was available for support; the center, 
in turn, could contact IEA if unable to solve a problem locally. 


ied) 


The navigational structure of the online questionnaire had to be as similar as possible to that of 
the paper questionnaires. Respondents could use “next” and “previous” buttons to navigate to an 
adjacent page, as if they were flipping physical pages. In addition, a hypertext “table of contents” 
mirrored the experience of opening a specific page or question of a paper questionnaire. While most 
respondents followed the sequence of questions directly, these two features allowed respondents 
to skip or omit questions, just as if they were answering a self-administered paper questionnaire. 


To further ensure the similarity of the two sets of questionnaires, responses to the online 
questionnaires were not made mandatory, evaluated, or enforced in detail (e.g., using hard 
validations). Instead, some questions used soft validation, such as respondents being asked to give 
numerical responses to questions that had a minimum and maximum value—for example, the total 
number of students enrolled in a school. In some instances, respondents’ answers to this type of 
question led to the response being updated according to the individual respondent’s entries, even 
if that response was outside the minimum or maximum value, but with the caveat that the response 
still needed to be within the specified width. 


Certain differences in the representation of the two modes remained, however. To reduce response 
burden and complexity, the online survey automatically skipped questions not applicable to the 
respondent, in contrast to the paper questionnaire, which instructed respondents to proceed 
to the next applicable question. Rather than presenting multiple questions per page, the online 
questionnaire proceeded question by question. 


While vertical scrolling was required for a few questions, particularly the longer questions with 
multiple “yes/no” or Likert-type items, horizontal scrolling was not. Because respondents could 
easily estimate through visual cues the length and burden of a paper questionnaire, the online 
questionnaires attempted to offer this feature through progress counters and a “table of contents” 
that listed each question and its response status. Multiple-choice questions were implemented 
with standard HTML radio buttons. 


Because the national centers were able to monitor the responses to the online questionnaires 
in real-time, they could send reminders to those schools where people had not responded in the 
expected period of time. Typically, in these cases, the centers asked the school coordinators to 
follow up with those individuals who had not responded. 


Although countries using the online mode in ICILS 2018 faced parallel workload and complexity 
before and during the data collection, they had the benefit of a reduction in workload afterwards. 
Because answers to online questionnaires were already in electronic format and stored on servers 
maintained by IEA, there was no need for separate data entry. 


ICILS 2018 FIELD OPERATIONS 105 


Online data collection for survey activities questionnaires 


Inorder to collect feedback about survey operations from NRCs, the international project team set 
up asurvey activities questionnaire online. The questionnaire was prepared and administered using 
the IEA OSS. As the survey activities questionnaire, unlike the other ICILS 2018 questionnaires, 
did not require national adaptations and was completed in English, it was well suited for online 
data collection. 


The purpose of the survey activities questionnaire was to gather opinions and information about 
the strengths and weaknesses of the ICILS 2018 assessment materials (e.g., test instruments, 
manuals, scoring guides, and software) as well as countries’ experiences with the ICILS 2018 survey 
operations procedures. NRCs were asked to complete these questionnaires with the assistance of 
their data managers and the rest of the national center staff. The information was used to evaluate 
survey operations. It is also being used to improve the quality of survey activities and materials 
used in future ICILS cycles. 


IEA sent the NRCs individual login information and internet links for accessing the online 
questionnaires. Before submitting the responses to IEA, NRCs could go back and change their 
answers if necessary. 


Scoring the assessment and checking scorer reliability 


Scoring the assessment 
The success of assessments containing constructed-response items depends on the degree to 
which student responses are scored reliably. Seventeen of the ICILS 2018 CIL assessment items 
were constructed-response items, and five large tasks were scored against a total of 37 criteria. 
One of the large-task criteria was automatically scored by the ICILS scoring system. Human scorers 
reviewed the automatically generated suggested score and could either accept or modify the 
score. Of the 102 ICILS 2018 CIL items, 53 were scored by human scorers, and it was critical to 
the quality of the ICILS 2018 results that these tasks were scored in a reliable manner. Reliability 
was accomplished by providing national centers with explicit scoring guides, extensive training 
of scoring staff, and continuous monitoring of the quality of the work during scoring procedures. 
There were 17 CT items for ICILS 2018. Two of these items were constructed response and 
scored by human scorers. 


During the scoring training, which was conducted at the international level, national center staff 
members learned how to score the constructed-response items and to use the scoring criteria 
for the large-task items in the ICILS 2018 assessment. Scoring training took place before both 
the field trial and the main survey. The training that took place prior to the field trial provided the 
participants with their first opportunity to give extensive feedback on the scoring guides, which 
were then revised on the basis of this feedback. The training conducted before the main survey 
enabled national center staff to give additional feedback on the scoring guides, with that feedbac 
based on their experiences of scoring the field-trial items. The scoring guides for the three ICILS 
trend modules were not revised and were identical to those used in the first ICILS cycle. Further 
details of the development and revision of the ICILS 2018 main survey scoring guide for open- 
ended response items are provided in Chapter 2. 


The main survey scorer training employed asample set of student responses collected during the 
field trial in English-speaking ICILS 2018 countries. The example responses used during scorer 
training were a mixture of those that clearly represented the scoring categories and those that 
were relatively difficult to score because they were partially ambiguous, unusually expressed, or on 
the “borderlines” of scoring categories. The scores that national center staff gave to these example 
responses were shared with the group, with discussion focusing on discrepancies in particular. The 
scoring guides and practice responses were refined following the scoring training to clarify areas 
of uncertainty identified during the scorer training. 


ICILS 2018 TECHNICAL REPORT 


Once training had been completed, the ISC provided national centers with a final set of scored 
sample responses as well as the final version of the scoring guide. The scored sample responses 


were access 


English. Nati 


to apply the 
national cen 


To prepare 
to organize 


ible electronically through the web-based scoring system and were available only in 
onal centers used this information, as they saw fit, to train their scoring staff on how 


scoring guides to the constructed-response items and large tasks. In some cases, 
ters created their own sets of example responses from the student responses collected 
in their country. 


for this task, the ISC provided national centers not only with suggestions on how 
staff, but also with materials, procedures, and details on the scoring process. The 


ISC encouraged the national centers to hire scorers who were attentive to detail, familiar with 
education, and who, to the greatest extent possible, had a background in CIL. The ISC also provided 


guidelines o 


Documenting scoring reliability 


Documenting the reliability of t 
of monitoring and maintaining t 


n how to train scorers to accurately and reliably score the items and tasks. 


he scoring process within countries was a highly important aspect 
he quality of the ICILS 2018 scored data. Scoring reliability within 


each country required two different scorers to independently score arandom sample of 20 percent 


of responses for each construc 


The selection of responses to b 
were random and managed by t 
that arandom selection of 20 p 


ted-response item and each large task. 


e double-scored and the allocation of these responses to scorers 
he web-based scoring software. The software was set up to ensure 
ercent of all responses was double-scored, and that scoring could 


begin before all student responses had been uploaded to the system (thus allowing for late returns 
of data from some schools). The software set-up also allowed these tasks to be accomplished 


Ww 


The degree o 
lity of 


of the reliabi 
inter-rater r 


additional tr 


thout com 


this informationt 
use the informati 


ow inter-rater 


tems with relati 


eliab 


f agreement between the 
the scoring process. The we 
g leaders who were encouraged (but not required) to use 


ility reports to scorin 
ohelpthem monitor t 


on to monitor the agreeme 
scorers whose agreement was low re 
reliability that might need 


scores 


he qua 


promising the selection probability of each piece of work for double scoring. 


,as assigned by the two scorers, provided a measure 
b-based scoring system was able to provide real-time 


ity of the scoring. Scoring leaders could, for example, 


ainin 


g to improve the qua 


vely low inter-rater reliabili 
of student achievement for that coun 
to inter-rater reliability. 


Field trial procedures 


served the p 
possible pro 


under condi 


refine their 


The field trial was cr 


The operational reso 
tions ap 
This process also al 


blems d 


nt of each scorer with their colleagues (and identify 
ative to others), or identify items or tasks with relatively 
to be rescored or to have scorers provided with some 
ity of their scoring. 


ty within a given country were not used in the estimation 
try. Chapter 11 outlines the adjudication process relating 


The ICILS 2018 field trial was a smaller administration of the |CILS 2018 assessment; on average, 
approximately 1000 students were tested in each participating country. 


The international field trial was conducted from May to June 2017. 


ucial to the development of the ICILS 2018 assessment instruments and also 
urpose of testing the ICILS 2018 survey operations procedures in order to avoid any 
uring the ICILS 2018 data collection. 


urces and procedures described in this chapter were used during the field trial 
proximating, as closely as possible, those of the main survey data collection. 
owed the NRCs and their staff to acquaint themselves with the activities, 
national operations, and provide feedback that could be used to improve the data- 


collection procedures. The field trial resulted in some important modifications to survey operations 
procedures and contributed significantly to the successful implementation of ICILS 2018. 


ICILS 2018 FIELD OPERATIONS 107 


Summary 


Considerable efforts were made to ensure high standards of quality in the survey procedures for 
the ICILS 2018 data collection. NRCs played a key role in implementing the data collection in each 
participating country, during which they followed internationally agreed upon survey operations 
procedures. The international study consortium provided NRCs with a comprehensive set of 
manuals containing detailed guidelines for the preparation of the study, its administration, scoring 
of open-ended questions, and data processing. National centers also received tailored software 
packages for sampling and tracking student and teachers within schools, the computer-based 
student assessment, data capture, and the online administration of contextual questionnaires. 
The international ICILS 2018 field trial in 2016-2017 was crucial for testing survey operations 
procedures in participating countries and contributed to the successful implementation of the 
main data collection. 


CHAPTER 9: 


Quality assurance procedures for ICILS 


2018 


Sandra Dohr, Lauren Musu, and David Ebbs 


Introduction 


Considerable effort was made to develop and stand 


ardize materials and procedures for ICILS 


2018 in order to collect high-quality and comparable data across countries to the greatest extent 


possible. For this purpose, qua 
ities from the ICIL 
t, sampling, instru 
ta analysis. Moreover, quality assurance activities occurred at both 
Is. Therefore, various members of 
rt of the quality assurance components 
h diverse perspectives, which he 
ures. It also supported the data adj 


all main activ 
developmen 
data collectio 
internationa 
study center 


concernin 
11) and provi 


The aim of thi 


that occurred du 


Precisely, thi 


activities 


Internation 
quality co 


S were pa 
provided the consorti 
egthes 


n, scaling, and da 
and national leve 


S framework. The activ 


um wit 


urvey proced 


ded possible exp 


s chapter 


s cha 


al qua 


centers wi 
ex 
Vv 


O 
S 


di 
C 
in 


perts for each 
isited 15 schoo 
testing session, 
bservations. 


ththe 


urvey activities questionnaire: The consortium 
sights into the national activities before, d 
ational research coordinators (NRCs) had 
fferent areas, including sampling, communi 
ollection, scoring, and data submission. NRCs also reported on nationa 
the survey activities questionnaire. 


ity control program: The con 


ity assurance was anin 


ment preparation incl 


tegral part of ICILS 2018 and encompassed 


ities included assessment and questionnaire 


uding the verification of national versions, 


anations for potential 


is to provide an overview and present resu 
ring the data collection for ICILS 2018 and survey activities on 
pter focuses on the two 
in ICILS 2018: 


this purp 


devel 
uring, 


International quality control program 


Compliance with internatio 


following aspects o 


sortium developed and implemented 
ntrol program to monitor data collection activities and compliance of national stu 
procedural standards. For 
participating country, nam 
s selected from the school sample in the respec 
interview the school coordi 


edinternatio 


nator and test ad 


to complete the question 
cation with schools, instrumen 


the consortium and the national 
for ICILS 2018. This approach 
ped to detect potential issues 
udication process (see Chapter 
nconsistencies or irregularities in the data. 


uality assurance activities 
a national level. 
f the overall quality assurance 


tsofq 


nal 
dy 
tracted and trained independent 
ity observers (IQOs). EachIQO 
tive country to observe an ICILS 
ministrator, and document their 


aninternatio 


EAco 
nal qu 


n 
a 


ose, | 


ive questionnaire to gain 
istration of ICILS 2018. 
naire, which covered 13 
t preparation, data 
ty control activities 


oped acomprehens 
and after the admin 


quali 


nally standardized procedures and guidelines is highly important 


to achieve the overarching goal of high-quality data, which is internationally comparable. The 


international 
designed and 


2018 on a national level. The program w 
a rigorous, standardized approach to da 


quality control 
carried out by | 


program, w 


as essential 
ta collection 


hich was conducted during the 
EA and its purpose was to monitor data collection activities for ICILS 


CILS main survey, was 


to ensure that participating countries took 
. For this purpose, IEA appointed IQOs for 


each participating country. The |QOs’ main task was to document data collection activities and 


determine whether the ICILS assessmen 


t was admini 


procedures in their respective countries. 


stered in compliance with the standardized 


110 ICILS 2018 TECHNICAL REPORT 


Selecting and training international quality observers (IQOs) 
|QOs are independent experts who have experience with school environments and ideally the ICT 
context, reside in the participating country, and have no affiliation with the national study centers. In 
order to facilitate the recruitment process, IEAasked NRCs to nominate two candidates to serve as 
IQOs for ICILS 2018 in their respective countries, which included sending the candidates’ CVs and 
ashort explanation why they think the nominees would be suitable for the position to IEA. Potentia 
candidates had to be familiar with school environments or the day-to-day operations of schools 
and needed to be ICT literate. Additionally, they needed to be fluent in both the administered 
language(s) in the target country and English. |QOs should also not have any professional or 
personal affiliation to the national study center. Potential candidates were, for instance, schoo 
inspectors, relevant ministry officials, retired school teachers, or school principals. After carefully 
reviewing the CVs and considering the NRC’s recommendation, IEA selected the most suitable 
candidate for the |QO position in every country. 


Following the selection process, IEA invited the selected candidates to Amsterdam for aone anda 
half day in-person training. During the training, they were familiarized with the content of the study, 
the general procedures, and their tasks and responsibilities as |QOs. They were also informed about 
the possibility of hiring |QO assistants to reduce their workload. This was particularly advised in 
the case of large countries in order to cover different regions, states, or territories of the country 
or in case of a very short data collection window. If |QOs decided to appoint an assistant, they 
were solely responsible for their training, communication, payment, and any other organizational 
tasks. However, |QOs needed to inform IEA about their assistants and were required to submit 
confidentiality agreements signed by the assistants. In addition to the training, |QOs received the 
following materials, which they needed to accustom themselves with the ICILS 2018 procedures, 
their responsibilities, and to complete their documentation: 


e International Quality Observer Manual 


e International versions of the principal, teacher, and ICT coordinator questionnaires 
e National adaptation form (NAF) template 


e International version of the School Coordinator Manual 


O 
e |nternational version of the Test Administrator Manual 


e School visit travel form 


e School visit tracking form 


e Administration observation record 


e Translation verification report 


e Checklist for collecting materials from the NRC 


e Confidentiality agreement to be signed by IQO assistants 


In addition to the information provided to |QOs during the training seminar, the above-listed 
materials contained all necessary instructions needed to serve as |QO for ICILS 2018. After the 
training seminar and during the |QOs’ fieldwork, IEA remained in close contact with the |QOs 
and provided further assistance and instructions if required, especially in case of unforeseen 
circumstances. 


QUALITY ASSURANCE PROCEDURES 111 


Overview of IQOs’ responsibilities 
The |QOs tasks and responsibilities were outlined and described in great detail in the |QO manual 
and conveyed during the training seminar. Their main responsibility was to monitor the data 
collection activities in their countries, document their findings, and report them to IEA. More 
precisely, |QOs’ tasks covered the following: 


e Familiarizing themselves with the ICILS content and context 


e Visiting the NRC and collecting national instruments and other materials 


e Selecting 15 schools for international quality contro 


e Contacting the schools selected for international quality control and arranging visits 


e Observing an ICILS testing session in each selected school 


e Interviewing the school coordinator and test administrator of each selected school 


e Completing the translation verification report 


> 


e Completing the documentation and reporting to IE 


Visiting the national research coordinator (NRC) 


Before the start of the data collection period for the ICILS 2018 main survey, |QOs were required 
to visit the NRC in their country. During this visit, |QOs had to select a sub-sample of schools for 
the observation of the student assessments in collaboration with the NRC. In addition, they had 
to collect national materials that were needed to conduct the school visits and a set of the national 
study instruments. NRCs provided |QOs with the following materials: 


e USB flash drive containing the national version(s) of the student test and questionnaire 


e Printed or digital version of the ICT coordinator, principal, and teacher questionnaire 


e Printed copies of the national version(s) of the School Coordinator Manual for each administered 
language 


e Printed copies of the national version(s) of the Test Administrator Manual for each administered 
language 


e Student listing and tracking forms for each selected school 


e Teacher listing and tracking forms for each selected school 


e Contact information for the selected schools 


The listed forms and documents were needed as areference during the testing session observations, 
for the interviews with the school coordinators and test administrators, and to complete the |QOs’ 
documentation. After |QOs completed their tasks and fulfilled their contract, they were asked to 
send all materials that they received from the NRC to IEA. 


The task for determining the subsample of schools for international quality control included 
selecting 15 schools, plus three extra schools as replacements in case the IQO had difficulties 
contacting the initially selected schools or other unforeseen circumstances. Ideally and to the 
extent possible, the sampled schools were chosen following a random selection process. However, 
arandom selection could be subject to a number of practical constraints. Therefore, |QOs were 
allowed to exclude schools outside of a reachable driving distance from where they or their 
assistants were residing. Another practical constraint was that the school should not already 
be selected for participation in the national quality control program (see the Survey activities 
questionnaire section below for more information about this program). Despite these constraints, 
QOs were asked to attempt to visit schools in different areas or regions of their country in order 
to ensure a decent geographical coverage. In the case of very large countries, |QOs were highly 
advised to appoint assistants for this purpose. Following the selection of the schools to be visited 
for international quality control, |QOs needed to send their selection to IEA for approval. For this 


112 


purpose, IEA mainly assessed the number of selected schools 


implications. Due to 
study consortium ap 


the |QO visited 20 schools in total; 15 for Germany as awh 


ICILS 2018 TECHNICAL REPORT 


, geographical coverage, and financial 


the country size of Luxembourg and its smaller sample size of schools, the 


proved that the I|QO in this country woul 


dvisit only 10 schools. For Germany, 


ole and additional five schools for the 


benchmarking region North Rhine-Westphalia. All together, |QOs visited the agreed upon number 


of schools, which were in total 195 schools in 14 educationa 


Testing session observations 


The main responsibility of |QOs was to cond 


ineach of the selecte 
selected school prior 


the testing day and to arrange th 


country or colliding 
the replacement sch 


to prepare print-outs or have a 


observation record, 
S 


| systems. 


uct an on-site observation of an ICILS testing session 


dschools. For this purpose, |QOs contacted the school coordinators for every 


to the beginning of the main survey data collection to obtain information about 
e visit. In the case of avery short testing window in the respective 
testing days, |QOs could either appoint assistants or consider using one of 


ools. In preparation for the observation 


and the student and teacher listing an 


chools. They also had to familiarize themselves with the Test Administrator Manual and Schoo 
Coordinator Manual. While |QOs could decide for themselves how they recorded their findings 
during the school visits, they were required to enter the results of their observations and their 
notes in an online version of the administration observation record after every school visit. The 
online version of the administration observation record was accessible through the IEA Online 
Survey System and all information needed to be entered in English. 


n electronic device where they could access the administration 


of the testing session, |QOs needed 


d tracking forms for the respective 


While observing, |QOs were asked to use the administration observation record to structure their 
observation of the testing session and document their findings. The administration observation 


record included multiple choice and open ended questions, and consisted of four different content 
areas: (a) the ICILS administration arrangements and settings; (b) the ICILS administration process; 
(c) summary observations and general impressions of the |CILS administration; and (d) the interview 
with the school coordinator and test administrator. The following section presents the data derived 


from these administration observati 


on 


ICILS administration arrangement 


The first section of the administration observation record asked about the assessment conditions 
and general preparations regarding the ICILS administration. It covered the setup of the room 
where the session took place and the arrangement of assessment materials. According to the 
answers in the administration observation record, around two thirds of the |QOs (67%) met the 
school coordinator prior to the preparation of the assessment room upon arriving at the schools. 
The preparation of the assessment room began about 60 minutes before the testing session started 
in around half of the schools (49%) and in approximately a third of the cases (36%) even earlier. 
IQOs also reported in most of the cases (91%) that the testing materials were safely stored and 


securely sealed when they arrived at the school. 


Generally, the testing rooms seemed in order and well prepared in the majority of the visited 
schools. Table 9.1 shows detailed answers about the set-up of the testing rooms and preparations, 
which includes the seating space for students, the set-up of the work stations, and preparatory 


actions taken by the test administrator. 


QUALITY ASSURANCE PROCEDURES 


Table 9.1: Preparation of the testing room 


Question Response category (%) 
Yes No Missing 

Did the test administrator receive an adequate supply of USB 99 il - 
flash drives? (if applicable) 

Is there adequate seating space for students to avoid unwanted 95 5 - 
distractions while testing is in progress? 

s there adequate space for the test administrator to move 98 2 - 
around the room while the testing is in progress? 

Does the test administrator have a watch or use another type of OT 3 - 
device to keep track of time while the testing is in progress? 

Did the test administrator set up all workstations with each of 81 19 - 
them displaying the welcome screen prior to the students’ arrival? 

s the test administrator ensuring that each student is sitting in 93 6 il 


ront of the computer specially prepared for him or her? 


Note: Percentages derived from a total of 195 responses. 


During the ICILS administration 


questionnaire. This 


in 


ied) 


made to the script, 


observations of vari 


n the second section of the administration observation record 
ous aspects related to the administration of th 
included observing and reporting on how accurately the test administrator 
followed the prescribed administration procedures and administrati 
section in the administration observation record, |QOs had to refer to the administration scripts 
the Test Administrator Manual and to the student tracking form 

dministrators followed the script, which was largely the case (see Table 9.2). If changes were 
they were mainly of a minor nature. In these cases, the test administrators 
added information or revised instructions from the script s 
deleted, for example, if students were inattentive or lost co 


Table 9.2: Test administrator adherence to the administration script 


, 1QOs documented their 
e student test and student 


on scripts. To complete this 


. |QOs documented if test 


ightly. Only in rare cases was content 
ncentration. 


Script type Response category (%) 
No Minor Major Missing 
changes | changes | changes 
ICILS assessment 68 28 4 - 
Computational thinking modules* 76 23 1 - 
Student questionnaire 76 20 3 2 


Notes: Percentages derived from a total of 195 responses. Because results are rounded to the nearest whole 
number, some totals may appear inconsistent. 


* Only applicable in eig 


t countries 


Concerning the computers or tablets used for the ICILS administration, |QOs reported that in 
almost all cases (92%) the modules being displayed on the screens during the ICILS assessment 
the information listed on the student tracking form for each student in the 
room. In almost all observed testing sessions (99%) exiting of the testing software of the student 
ly and the screens reverted back to the initial login screen. 


corresponded with 


devices ran smooth 


113 


114 ICILS 2018 TECHNICAL REPORT 


As for students’ compliance with the allocated time for the administration, |QOs reported that the 
majority of the students (92%) stopped working immediately after the allowable time period for the 
ICILS assessment ended. For countries that administered the computational thinking (CT) module, 
the majority of the students (96%) stopped working immediately after the allocated time period 
for the CT modules ended as well. In nine percent of the cases, there were students in the testing 
session who completed the assessment before the end of the allowable time period. Regarding the 
administration of the student questionnaire, approximately a third of the students (32%) asked for 
additional time to complete the questionnaire. In these cases students were given between one and 
30 additional minutes to complete the student questionnaire (on average around seven minutes). 


General impressions of the |QO 


The third section of the administration observation record was dedicated to general impressions 
and observations of the |QO during the ICILS administration. In general, |QOs described the 
overall quality of the ICILS administration in about half of the cases (46%) as excellent, in around 
a third of the cases (35%) as very good, and in around an eighth of the cases (13%) as good. A few 
sessions were considered only of fair (5%) or poor (1%) quality. Most |QOs were also under the 
impression that test administrators familiarized themselves with the procedures and scripts prior 
to the test administration (95%). 


Concerning the technical infrastructure at the schools, |QOs did not observe many technical 
issues (see Table 9.3). In some cases, problems when logging in, using the USB flash drives, or other 
malfunctions when using the ICILS assessment delivery system occurred. Other malfunctions 
included appearing error messages, sudden interruption or locking of amodule, or frozen screens. 
QOs reported on the necessity to change tablets, replace USB flash drives, or to restart computers 
insome testing sessions to resolve the problems that occurred. |QOs considered the actions taken 
by the test administrators to solve problems as efficient most of the time (95%). 


Table 9.3: Technical problems experienced with procedures or devices 


Procedures/devices Response category (%) 

Yes No Missing 
Logging in 13 87 - 
Using USB flash drives al} G2 15 
Timer function 2 97 1 
Other malfunctions when using the delivery system Pall 78 1 
Keyboard (switching languages) 3 94 3 


Note: Percentages derived from a total of 195 responses. 


Further, |QOs were under the impression that test administrators recorded students’ attendance 
onthe student tracking form correctly (100%). In total, there were only afewsessions with students 
who refused to take the assessment either prior to or during the administration (5%). |QOs reported 
that in about three quarters of the observed sessions (76%) students did not have particular 
problems with the ICILS administration. If there were issues, |QOs reported about students getting 
tired or losing concentration due to the length of the assessment. Furthermore, it appears that some 
students had difficulty understanding certain exercises. Overall, |QOs considered the students in 
around two thirds of the observed testing sessions as extremely orderly and cooperative (68%) 
and ina bit less than a third of the sessions (28%) as at least moderately orderly and cooperative. 
Inmost of the sessions (87%) |QOs did not observe students attempting to cheat or students who 
did not pay attention. In the case of non-cooperative and disorderly students, |QOs reported that 
most of the test administrators made an effort to control the students and the situation (84%). 


QUALITY ASSURANCE PROCEDURES 115 


Interview with school coordinators and test administrators 


The final section of the administration observation record was dedicated to interviews with the 
school coordinator and test administrator. The purpose of these interviews was to obtain their 
evaluation of the |CILS administration, gather suggestions for improvement, and acquire additional 
background information. 


School coordinators were mainly internal school staff. Fifty-one percent were school principals 
or other members of school management, 37 percent were teachers in the school where the 
assessment took place, and nine percent were other school staff members. In contrast, when 
it comes to test administrators, about a third of them were internal to the school where the 
assessment took place and about two thirds were external. More precisely, they were principals 
(8%), teachers of ICT (10%), teachers of another subject (13%), other school staff (6%), ICILS 
national center staff (17%), or they had an external position (46%). 


When asking about the overall quality of the ICILS administration, 77 percent of all school 
coordinators said that it went very well and without problems and 16 percent answered that it went 
satisfactory and only with a few problems. If there were problems, school coordinators sometimes 
described difficulties with organizing and arranging the administration. In some schools, there were 
technical issues or problems with using technology. School coordinators rated the general attitude 
of other school staff members towards the ICILS assessment as positive (58%) or neutral (32%) 
inmost of the cases. Furthermore, school coordinators reported to |QOs that approximately half 
of the students (54%) received some sort of special instructions, motivational talks, or incentives 
to prepare them for the assessment. 


With regard to the School Coordinator Manual, the majority of the school coordinators (89%) felt 
that the manual worked well and did not need improvement. However, some suggestions for 
improvement were given and included shortening the manual, a better structure of the manual, or 
clearer instructions in certain areas. In addition to the Schoo! Coordinator Manual, around half of the 
school coordinators (47%) received some sort of additional training, instructions, motivational talks, 
or incentives from the national centers. Regarding the Test Administrator Manual, three quarters of 
the test administrators (75%) felt the manual worked well and did not require improvement. Some 
test administrators suggested improvements that included more specific explanations for certain 
aspects of the assessments, clearer instructions, or reducing redundancy. 


QOs asked school coordinators about various forms used during the ICILS administration and 
if the information on these forms was correct. Table 9.4 shows that teacher listing forms and 
student listing forms contained correct information in the majority of the schools, and that the 
teacher questionnaires or cover letters were distributed according to the teacher tracking form 
in almost all cases. 


Table 9.4: Teacher and student listing forms and teacher tracking forms used for the assessment 


Question Response category (%) 
Yes No Missing 
Did the teacher listing form include all eligible teachers as listed 93 5 2 


in the school timetable for the target grade? 


Were all students participating in the ICILS assessment listed on Os} 4 3 
the student listing form? 


Were the teacher questionnaires/cover letters distributed 85 9 7 
according to the teacher tracking form? 


Notes: Percentages derived from a total of 195 responses. Because results are rounded to the nearest whole 
number, some totals may appear inconsistent. 


116 


Translation verification report 


ICILS 2018 TECHNICAL REPORT 


All national study instruments underwent a thorough translation verification process before 
ministration (see Chapter 5 for more information). The purpose of translation verification 
was to ensure that the language rema 
international source version of the instruments. Upon completion of translation verification, the 
instruments containing translation verifier comments and suggestions were released back to 
the NRC. The NRC reviewed the verifier’s feedback and had the final decision on whether or not 
adopt the recommendations. The consortium requested all NRCs to respond to the verifier’s 
feedback and to document whether they agree or disagree with the verifier’s suggestion or if they 
want to modify the translation further. 


ad 


to 


Th 


ins faithful and equivalent in meaning as intended by the 


e |QO's task was to review the final national study instruments and the NRC’s response to the 


translation verification feedback. They had to check whether or not the NRC adopted the verifier’s 


re 


the student test modules, which they 


ve 
fin 


addition, |QOs were asked to give their opi 


suggestions and whether or not they documented their actions correctly. For this purpose, |QOs 
ferred to the final set of teacher, principal, |CT coordinator, and student questionnaires, and 


received from the NRC. In addition, IEA provided |QOs 


th translation exports from the translation system, which included the NRC’s response to the 
rifier’s comments. IQOs had to compare the translations after translation verification with the 


al translations and confirm if the NRC’s response corresponded with the applied changes. In 


nion on the overall quality of the verifier’s work and 


the overall use of the verifier’s feedback by the NRC. |QOs had to document their findings in the 


translation export and the translation verification report. 


In 


Manual. Nation 


ve 


was to review the national version 
the internation 


addition to the review of the ICILS 2018 study instruments, I|QOs had to report on the 
appropriateness of the national adaptation of the School Coordinator Manual and Test Administrator 


rsions did not undergo any verificati 


sections that described the role of the 


of 
an 


the alignment of sections about t 


ad 


the national version of the manuals. 


students and teachers, preparing f 


al study centers had to adapt these manuals to their national context. The national 


on process performed by the consortium. The IQO’s task 


s of these manuals and document if they were consistent with 
al templates. For the School Coordinator Manual, |QOs checked the alignment of the 


school coordinator, preparing for within-school sampling 
or the administration, administering the questionnaires, 


d quality control during the main survey. For the Test Administrator Manual, |QOs reported on 


he role of the test administrator, conducting the assessment, 


ministering the test, and troubleshooting. |QOs also documented if additions were made to 


Survey activities questionnaire 


The su 
areas of th 


pa 


rvey activities questionnaire (SAQ) is acomprehensive questionnaire that covers different 
e implementation of the ICILS 2018 main survey procedures. National centers of all 


rticipating educational systems! gave their feedback on the general approach, standardized 
procedures, and materials provided by the consortium (e.g., manuals, software, forms) for the ICILS 
2018 main survey. They also reported 
for improvements for future ICILS cycl 


Sampling 


on difficulties they experienced and provided suggestions 
es. The SAQ addressed the following areas of interest: 


Contacting schools and recruiting school coordinators 


Implementing and documenting nati 


Translating instruments using the | 
system from RM Results, AM Desig 


onal adaptations using the NAF 


EA Translation System and the AssessmentMaster (AM) 
ner. 


Germany and North Rhine-Westphalia collab’ 
total. 


orated for filling in the SAQ. Therefore, there were 13 country entries in 


QUALITY ASSURANCE PROCEDURES 


Assembling a 
Preparing on 
Administerin 
National qua 


Scoring open 


nd preparing the ICILS materials for administrati 


ge ICILS 
ity control activities 


-ended response items 


on 


ine adult questionnaires using the IEA Translation System 


Coding occupation data using the IEA Coding Expert software 


Time require 


Entering data manually and submitting data 


d for survey activities 


Other experiences 


Overall, the feedback provided by the NRCs was considered valuable, fruitful, and helped the 


consortiumt 


o gain a better perspective on the survey activities at the national level. The following 


sections summarize the most important results and insights from the SAQ. 


Sa 


wi 
supported for al 


mpling 


th regard to collaboration with the IEA sampling team, all NR 
sampling related matters. They further informed the consortium that they did 


not find it difficult to adapt the international sampling design to th 


on 


frame, which list 


at 
as 


all. NRCs fur 
sufficiently d 


e country considered this as somewhat difficult. The same is the case for creating a sampling 


s all eligible schools. Twelve out of 13 national cen 
ther considered the Survey Operations Proced 
escribing the procedures when it comes to defi 


populations of the survey and developing national sampling plans. 


that 


W 


Con 


indows Withi 


National study c 
the schools by phone, six by regular mail, eleven by email, and five i 


or 


personal visits. For contacting the schools, ni 


n-School Sampling Software. 


tacting schools and recruiting school coordinators 


Cs reported that they felt well- 
eir national specifications. Only 


ters did not find this task difficult 
ures Unit 1 (Sampling Schools) 
ning and identifying the target 

ational study centers reported 


they had no or only some difficulties to select the student and teacher samples with the IEA 


enters contacted the sampled schools in multiple ways. Eleven countries contacted 


nother ways including websites 


ne national study centers used letters based on 


other national projects, one used example letters provided in the Survey Operation Procedures 
Unit 2 (Working with Schools), and three used other kinds of lett 


The SAQ furthe 


CO 
CO 


untries partic 


r asked if schools’ participation in ICILS 2018 was compulsory in the participati 
untries. Seven countries reported that participation was not compulsory in any school. In four 


ipation was compulsory for all schools and in two 


ers. 


ng 


countries it was compulsory for 


some of the schools, for instance, due to different regulations in federal states. Nine out of 13 NRCs 


reported havi 


pa 


rticipation in 


ng difficulties in convincing schools to participate. Reasons for this included schools’ 


other international studies, ongoing school reforms, or busy school schedules. 


National study centers trained school coordinators in different 
training methods. The training methods included formal in-person training sessions, instructions 


via 


ways and sometimes combined 


telephone, emails, videos, and/or conventional written instructions. Other methods included 


webinars or visiting schools in person. Table 9.5 shows more detailed information about school 


CO 


ordinator 


certain aspects of the ICILS administration. 


trainings as well as difficulties school coordinators reported with understanding 


117 


118 


Table 9.5: School coordinator information 


ICILS 2018 TECHNICAL REPORT 


Question 


Response category (%) 


How did you train the school coordinators? 


Yes 


Formal training session 


Through telephone, email, or video-link 


Written instructions 


Other 


5 
4 
8 
> 


Question 


Yes No 


Did the school coordinator report difficulties with understanding any of the following aspects of 
ICILS administration?* 


Identifying eligible teachers and/or students 


The necessity for listing all target grade teachers 


The necessity for listing all target grade students 


4 


agging students to be excluded prior to school sampling 


o 


assrooms 


The rationale for sampling students from the target grade across 


The process for running the USB compatibility tests in schools 


administration 


The steps to set up computers in schools to support successful test 3 9 


The test administration procedures 


Notes: Percentages derived from a total of 13 responses. 
* One country did not answer this question 


Implementing and documenting national adaptations and translating instruments 


The SAQ asked national centers about their experiences with the adaptation of the international 


versions of the ICILS 201 
the student test modules and the school principal questionnai 
he student questionnaire (2), the teacher questionnaire (1), and the ICT coordinator 
enting national adaptations 


translating t 
questionnaire (1). When docum 
reported not having difficulties (11). 


the trans 


Around half of the countries (6) d 
into national languages using A 
the student test and questionnaire. | 
ation process. These di 


18 study ins 


idn 


ot report difficulties with t 


system and general functionality. Regarding the translation of t 
coordinator questionnaires using the IEA Translation System 
If there were difficulties, countries reported issues with the 
general usability and user-friendliness of the system. Translation verification feedback provided 
countries (12) as very useful or at least somewhat useful. In 


not experience any difficulties (10). 


by IEA was considered by almost al 
general, most of the national study centers did not report major 


verification (12) and translation veri 


fication (11) processes. 


Preparing the ICILS materials for administration 


Nati 


truments. None of the countries reported difficulties adapting 


re. A few countries had difficulties 


to the NAFs, almost all countries 


ranslating the student instruments 


Designer, which was provided by the consortium for translating 
The rest of the countries (7 
ficulties included technical pr 


experienced some difficulties with 
oblems when using the translation 
he ICILS principal, teacher, and ICT 
, the majority of the countries did 


problems regarding the adaptation 


onal centers were asked to share their experiences with the preparations of the |CILS materials 


for administration. A few countries (4) reported having difficulties during the layout verification 
process of the ICILS delivery system for the student instruments. These difficulties were mainly due 


QUALITY ASSURANCE PROCEDURES 


to the functionality and usage of the system. The majority of the countries (12) did not report any 
major problems when it came to the verification process for the online questionnaires. Most of the 
countries did not experience major difficulties when downloading and replicating the ICILS delivery 
system to the USB flash drives, which included downloading the test file image (13), extracting 
he test file to a USB (12), and creating copies of the USB flash drives for use in the testing (11). 


t 


Administering ICILS 
Some countries (4) reported problems concerning the web-application that occurred during the 
administration of the online questionnaires. Additionally, five countries observed some issues with 
he login procedure during the administration of the online questionnaires. These included issues 
with accessing the online questionnaire or that respondent identification (ID) and passwords 
did not work properly. Problems during the administration of the ICILS assessment included 
malfunctions of the delivery system (e.g., freezing of the screen), which in some cases led to the 


t 


rebooting of the delivery system or other technical issues (e.g., corruption of USB flash drives, 


issues with the firewall). 


Concerning the participation of the ICILS populations, some national study centers experienced 


di 


fficulties in achieving high participation rates. Six countries reported problems with student 


participation, mainly due to difficulties with receiving parents’ consent or student refusal. Eight 
countries experienced problems with teachers’ participation. Reasons included teachers’ workload, 
lacking motivation, or missing support of teacher unions. Four countries reported problems with 
ICT coordinators’ and principals’ participation rates. 


Scoring open-ended response items 


Scoring activities for open-ended response items in the participating countries were mainly 
performed by national center staff (3), teachers or professional educators (9), or university students 
5). On average, 12 scorers were used per country. A few countries (5) reported that their scorers 
had difficulties using the scoring system for training and scoring the student work, which included 


( 


problems with acce 


ssing the software, speed of the application, or displaying pictures appropriately. 


Further, six countries reported difficulties with the reports for reliability scoring. 


Entering and coding occupation data 


For coding occupations, most countries used mainly their own staff members (9). In addition, a 
ew countries used experts from external organizations (3) or university students (1). On average, 
countries used four coders for coding of the occupation data. Most of the national study centers 
12) made use of the International Standard Classification of Occupations (ISCO-08) scheme for 
coding occupations and used either existing translations or their own translations of the scheme. 
None of the countries reported problems with converting the national coding scheme to ISCO-08. 


f 


( 


Seven coun 


about working with the IEA Coding Expert software. 


Entering data manually and submitting data 


Entering data manually was only applicable for three cou 
he teacher, principal, and ICT coordinator questionnaires solely online. One of these three 
countries used the IEA Data Management Expert (IEA 


t 


f 


occurred duri 
n 
study center. 


tries reported having difficulties with coding the student responses and one country 


ntries, as all other countries administered 


DME) software to enter the derived data 


rom the questionnaires. 


National quality control activities 


n addition to the international quality control program, quality control at the national level also 
ng the data collection. National study centers were responsible for conducting a 
ational quality control program, for which the composition was at the discretion of the national 
However, the consortium provided recommendations and guidelines in the form 


of a National Quality Observer Manual to NRCs. The recommendations included recruiting and 


119 


120 ICILS 2018 TECHNICAL REPORT 


training NQOs, visiting 10 percent (or aminimum of 15) of the sampled schools to observe a testing 
session, and to interview the school coordinator and test administrator in each school. The manual 
included instructions for NQOs anda template ICILS administration observation record. However, 
countries could decide whether they would use the manual and could adjust it where necessary. 


All 13 participants reported that they conducted national quality control. However, specific 
national quality control activities varied from country to country. Each country appointed one or 
more NQOs, who were mainly staff members of the national center, external experts, or schoo 
inspectors. NQOs were trained in multiple ways including formal training sessions (8), instructions 
viatelephone, email, or videos (2), and written instructions (6). A minimum of six schools was visited 
in the course of the national quality control program in each country. Two countries reported 
conducting some sort of national quality control activities in all sampled schools, which included, 
for example, regular check-in calls with the test administrators. In one of these two countries, every 
testing session was observed on-site by a staff member of the national study center. Most nationa 
centers (12) reported that the visited schools were located in different regions of the country. The 
majority of the countries (11) made use of at least parts of the National Quality Observer Manua 
provided by the consortium. Adjustments to the manual included reducing the length of the 
document or adapting it to the national context. The manual was also used as a basis for producing 
training materials. Issues reported by NQOs included technical issues such as freezing or crashing 
of the software or making use of the USB compatibility check. NQOs further reported on minor 
procedural deviations, which included non-adherence to the allocated time for the testing session. 
They also provided feedback regarding the teacher and student tracking forms used during the 
administration and made suggestions for improvements. 


3 


Summary 


The ICILS consortium, in cooperation with the participating countries, developed and coordinated 
range of quality assurance measures in order to monitor the quality of the study administration. 
his chapter focused on two quality assurance activities that occurred during the main survey 
for ICILS 2018, namely the international quality control program and the SAQ. The information 
presented focused on the structure and general approach of these two components as well as the 
most important results. 


ied) 


The |QOs were appointed to observe 15 testing sessions of the ICILS student assessment in each 
participating country and interview the school coordinator and test administrator. Their main task 
was to monitor compliance with the internationally standardized procedures and report their 
findings to IEA. The SAQ was completed by all national study centers and shed light on different 
areas of the implementation of the ICILS 2018 main survey procedures, such as sampling, study 
preparation activities, administration of |CILS 2018, or data processing. NRCs provided their 
feedback on the general approach to ICILS 2018, the procedures, and support materials provided 
by the consortium. 


Taken together, these activities form an important source of information about the implementation 
of different steps and processes of ICILS 2018, taking into account different perspectives from 
the national and international levels. 


CHAPTER 10: 


Data management and creation of the 


ICILS 2018 


Ekaterina Mikheeva 


Introduction 
This chapter describes t 


that were implemented by | 
centers of the participati 


Preparing the ICILS 2018 international database and ensuring its i 


endeavor requiring exte 


the countries had created their data files and submitted them to | 


cleaning is an extensive 
create as 


Al 


Th 
to 
Al 
ha 


All insti 
to assu 


econten 


questionnaires; 


variable 


rmonization where 


re th 


Data sources 


Computer-based stude 


tandardized ou 


database 


and Sebastian Meyer 


ng countries. 


nsive collaboration between | 


process of checking data for 
tput. The main goals of the data cleaning pr 


information in the database conformed to the internationally defi 


necessary); and 


nt assessment 


ER, and the nati 


ntegrity was a com 


ocess were to ensure: 


ned data structure; 


he procedures for checking the ICILS 2018 data and database creation 
EA, the ICILS international study center (ISC) at AC 


onal 


plex 


EA, the ISC, and the national centers. Once 
EA, data cleaning began. Data 
inconsistencies and formatting the da 


ato 


t of all codebooks and documentation appropriately reflected national adaptations 


s used for international comparisons were comparable across countries (after 


tutions involved in this process applied quality control measures throughout, in order 
e quality and accuracy of the ICILS 2018 data. 


As a computer-based assessment, ICILS 2018 exclusively generated electronic data from the 


st 
th 
an 


estudent 


d supported all assess 


information 
ad 
method for ICILS 2018, 
plus spare sticks incase 
or administering the stu 
also provided schools w 


£ 


students who were to be assig 
administrators to indicate stud 


Inschools where the default admini 
were stored on the individual 


was utili 
the assessment. At the t 
format. 


assessment software exis 


Each student was assigned an instru 
literacy (CIL) test modu 
where the computational thinking (CT) 
ditional CT assessment modules af 


Each form of interaction performed by 


ment languages. 


es were adminis 
uestionnaire ses 
provided schoo 
ure). Ifthe computers at asc 
dent assessment, laptop computers were p 
ith a student tracking form that notified th 
ned the assessment and questionnair 
ent participation for subsequent veri 


modu 
ter the q 
the national centers 
of technical fail 


tered 
sion. 


, 


stration method was utilized, data 


on the server laptop 


ment version that contained two of the five com 
es as well as the student questionnaire. In those 


As the default admi 
s with one USB stick per student 
hool were deemed unsuitable 
rovided. The national cen 


USB sticks used to deliver the assessment. If the server me 
zed, then the student data were stored 


puter 


students were assigned 


udent assessment along with paper or electronic tracking information. In general, one version of 


ted per country. It included all test modules and components, 


and 


countries 


two 


nistration 


ters 


e test administrators of the 


e. The form also al 
fication. 


fromthe student assessn 


that was used to admini 


ime of running the student assessment, da 


owed the 


nent 


thod 


ster 


ta were saved in a long data 
the student was saved as an “event” with aun 
timestamp. After the assessment, the national centers used an upload tool to upload the da 


ique 


ta to 


a central international server. This tool was provided either on the USB sticks or on the server 
computer for those schools that administered the assessment through the laptop server. 


122 


ICILS 2018 TECHNICAL REPORT 


The data were transposed from long format to a wide format so that they could be transformed 
for processing and analysis. This process resulted in a predefined table structure, containing one 


record per student as well as a 
reporting. 


During this step, aset of c 


ll vari 


ables that were to be used for data processing, analysis, and 
alculations needed to take place. Examples include automatic 


scoring of some of the complex or constructed-response items and large tasks, or aggregating time 


spent on an item across multip 


le visits to it. 


Online data collection of school and teacher questionnaires 


ICILS 2018 offered online coll 
countries adopted the online o 
school principals, |CT coordina 


required infrastructure for onli 
ruling out unit nonresponse as 


ection of school and teacher questionnaire data. All participating 


ption as a default data-col 


ne participation were provi 
aresult of a forced admini 


To ensure confiden io 


individual 


tiality, nat 


fromthe | 
validation 


step conducted at th 


the respondents logged into the survey. 


As respondents completed their online ques 


nal centers provided eve 


ection 
tors, and teachers). National cente 
respondents who refused to participate in the online mode or 
dedwitha 


stration 


ry respondent with a letter tha 
ogin information along with information on how to access the onli 
This login information corresponded to the respondent 
EA Windows Within-Schoo 


identificati 


Sampling Software (I 
e national cen 


mode for all respondents 


who did not have access 
paper questionnaire, t 
mode. 


that is, 


rs had to ensure that individual 


to the 
hereby 


ne questio 


on(| 


tionnaires, their data were automatically stored on 


CCONn 


D) and checksum provided 
EA WinWS3S), meaning that 
ters for paper-based questionnai 


the identity 
res occurred 


tained 
nnaire. 


when 


the central interna 
separate tab 
(at IEA) with 


Potenti 
absolu 
To achi 
conten 
tota 


te minimum 
eve this, ICI 
ts and comp 


e onthe server. The different 
the da 


al sources of error originating 


tional server. Data for 


ta collected as part of the 


from 
to ensure uniform and 


language versions within countries were then merged 
ithin-school sampling process. 


theu 
com 
LS 2018 questionnaires in both modes were self-administered, had identical 
arable layout and appearance, and required 
e place over the same period of time. 


country-language combination were stored in a 


se of the two parallel modes had to be kept to the 
parable conditions across modes and countries. 


the data collection for both modes 


Data entry and verification of paper questionnaires 


Data entry 


Each national center was responsible for transcribing the information from paper-based 


Management Expe 


National centers e 


principal, |CT coordinator, and teacher questionnaires into computer data files using the IEA Data 
rt (IEA DME) software. 
ntered responses from the paper questionnaires into data files created from 


an internationally predefined 
lengths, labels, vali 


or ordinal questio 
questionnaire type 


d ranges (for continuous 
ns), and missing codes 
S. 


EA provided coun 
with a codebook 
structure. National 


n general, national 


response. To ensu 
EA DME required 


that reflected the nationall 


were returned comple 


tries who needed to manu 


codebooks were exported di 
between online questi 


codebook. The codebook contained informa 


tion about the names, 


measures or counts) or valid values (for nominal 
for each variable in each of the three nonstudent 


ally enter data from paper-based questionnaires 
y adapted and 
rectly from 
onnaires and the data entry codebook. 


internationally verified questionnaire 
the online system to ensure consistency 


centers were instructed to discard any questionnaires that were unused or that 


da 


tely empty, and to enter any questionnaire that contained at least one valid 
re consistency across participating countries, the basic rule for data entry inthe 
ta to be entered “as is” without any interpretation, correction, truncation, 


DATA MANAGEMENT AND CREATION OF THE ICILS 2018 DATABASE 123 


imputation, or cleaning. Resolution of any inconsistencies remaining after this data entry stage 

would occur during data cleaning (see below). 

The rules for data entry included the following: 

e Responses to categorical questions to be generally coded as “1” if the first option was used, “2” 
if the second option was marked, and so on. 

e Responses to “mark-all-that-apply” questions to be coded as either “1” (marked) or “9” (not 
marked/omitted). 


e Responses to numerical or scale questions (e.g., school enrolment) to be entered “as is,” that is, 
without any correction or truncation, even if the value is outside the originally expected range 
(e.g., if an ICT coordinator reports more than 1000 computers available to students in the 
school). 


e Likewise, responses to filter questions and filter-dependent questions to be entered exactly as 
filled in by the respondent, even if the information provided is logically inconsistent. 


e |f responses were not given at all, not given in the expected format, ambiguous, or in any other 
way conflicting (e.g., selection of two options in a multiple-choice question), the corresponding 
variable was to be coded as “omitted or invalid’ 


Data entered with the IEA DME were automatically validated. First, the entered respondent ID 
was validated with a five-digit code—the checksum (generated by the IEA WinW38S). A mistype in 
either the ID or the checksum resulted in an error message that prompted the data-entry person to 
check the entered values. The data-verification module of the IEA DME also enabled identification 
of a range of problems such as inconsistencies in |D codes and out-of-range or otherwise invalid 
codes. Individuals entering the data had to resolve problems or confirm potential problems before 
they could resume data entry. 


Double-data entry 


To check the reliability of the data entry within respective participating countries, national centers 
were required to have all of the paper-based principal, |CT coordinator, and teacher questionnaires 
entered by two different staff members. IEA recommended that national centers begin the double- 
data entry process as early as possible during the data capture period in order to identify possib 
systematic, incidental misunderstandings or mishandlings of data entry rules and to initiate 
appropriate remedial actions, for example, retraining staff. Those entering the data were required 
to resolve identified discrepancies between the first and second data entries by consulting the 
original questionnaire and applying the international rules in a uniform way. 


oo) 


While it was desirable that each and every discrepancy be resolved before submission of the 
complete dataset, the acceptable level of disagreement between the first and second data entry 
was established at one percent or less; any value above this level required complete re-entry of the 
data. This restriction guaranteed that the margin of error observed for processed data remained 
well below the required threshold. 


The level of disagreement between the first and second data entry was evaluated by IEA. Data for 
those countries who had administered paper-based questionnaires and submitted an IEA DME 
database showed no differences between the main files and the files created for the purpose of 
double-data entry. 


Data verification at the national centers 


Before sending the data to IEA for further processing, national centers were to carry out 
mandatory validation and verification steps on all entered data and apply corrections as necessary. 
The corresponding routines were included in the IEA DME software, which automatically and 
systematically checked data files for duplicate |D codes and data outside the defined valid ranges or 
value schemes. Data managers reviewed the corresponding reports, resolved any inconsistencies, 


124 ICILS 2018 TECHNICAL REPORT 


and (where possible) corrected problems by looking up the original survey questionnaires. Data 
managers also verified that all returned non-empty questionnaires had definitely been entered. 
They also checked that the availability of data corresponded to the participation indicator variables 
and entries on the tracking forms and as entered in the IEA WinW3S. 


In addition to submitting the data files described above, national centers provided IEA with detailed 
data documentation, including hard copies or electronic scans of all original student and teacher 
tracking forms and areport on data-capture activities collected as part of the online survey activities 
questionnaire. IEA already had access, as part of the layout verification process, to electronic copies 
of the national versions of all questionnaires and the final national adaptation forms. 


While the questionnaire data were being entered, each national center used the information from 
the teacher tracking forms to verify the completeness of the materials. Participation information 
(e.g., whether the concerned teacher had left the school permanently between the time of sampling 
and the time of administration) was entered via the IEA WinW3S. 


This process was also supported by the option in the IEA WinWSS to generate an inconsistency 
report. This report listed all discrepancies between variables recorded during the within-school 
sampling and test administration process and so made it possible to cross check these data against 
the actual availability of data entered in the IEA DME, the database for online respondents, and 
the uploaded student data on the central international server. Data managers were requested to 
resolve these problems before final data submission to IEA. If inconsistencies had to remain or 
the national center could not solve them, IEA asked the center to provide documentation on these 


problems. IEA used this documentation when processing the data at a later stage. 


Confirming the integrity of the national databases 


Overview 


As described earlier in this chapter, national centers in each participating country were responsible 
for entering their national |CILS 2018 data into the appropriate data files and submitting these 
files to IEA. Furthermore, the data from the online questionnaires were automatically stored ona 
central international server. IEA then subjected these data to acomprehensive process of checking 
and editing. To facilitate the data cleaning process, IEA asked the national centers to provide 
them with detailed documentation of their data together with their national data files. The data 
documentation included copies of all original survey tracking forms, the national versions of test 
booklets and questionnaires, as well as information from the survey activities questionnaire (see 
details in Chapter 6). National centers also submitted their final national adaptation forms in order 
to provide and confirm complete documentation on all national adaptations. In addition, national 
centers were asked to provide documentation on all changes or edits applied to the data prior to 
submission, as well as any verified findings that could remain. 


Ensuring the integrity of the international database required close cooperation between the 
international and national institutions involved in ICILS 2018. After each country had submitted 
its data and required documentation, IEA, in collaboration with the national research coordinators 
NRCs), conducted a four-step cleaning procedure upon the submitted data and documentation: 


( 

(1) Documentation and structure check; 

(2) \ID variable cleaning; 

(3) Linkage cleaning; and 

(4) Background cleaning (resolving inconsistencies in questionnaire data). 


DATA MANAGEMENT AND CREATION OF THE ICILS 2018 DATABASE 125 


The cleaning process was an iterative process. Numerous iterations of the four-step cleaning 
procedure were completed on each national data set. This repetition ensured that all data were 
properly cleaned and that any new errors that could have been introduced during the data cleaning 
were rectified. The cleaning process was repeated as many times as necessary until all data were 
consistent and comparable. Any inconsistencies detected during the cleaning process were resolved 
in collaboration with national centers, and all corrections made during the cleaning process were 
documented in a cleaning report produced for each country. 


During the first step, IEA checked the data files provided by each country. In the following steps, 
they applied a set of over 120 cleaning rules to verify the validity and consistency of the data and 
documented any deviations from the international file structure. 


Having completed this work, IEA staff sent queries to the national centers. These required the 
centers to either confirm IEA’s proposed data-editing actions or provide additional information to 
resolve inconsistencies. After all modifications had been applied, IEA rechecked all datasets. This 
process of editing the data, checking the reports, and implementing corrections was repeated as 
many times as necessary to help ensure that data were consistent within and comparable across 
countries. 


After the national files had been checked, IEA provided national centers with univariate statistics 
at the national and international levels. This material enabled national centers to compare their 
national data with the international results as they were included in the draft international report 
and related data and documentation. 


This step was one of the most important quality measures implemented, because it helped to 
ensure the comparability of the data across countries. For example, a particular statistic that 
might have seemed plausible within a national context could have appeared as an outlier when the 
national results were compared with the international results. The outlier could hint to an error in 
translation, data capture, coding, etc. The international team reviewed all such instances and, where 
necessary, addressed them, for example, by recoding the corresponding variables in appropriate 
ways or, if errors could not be corrected, removing them from the international database. 


Once the national databases had been verified and formatted according to the international file 
format, IEA sent data to the ISC, which then produced and subsequently reviewed the basic item 
statistics. At the same time, IEA produced data files containing information on the participation 
status of schools, students, and teachers in each country’s sample. IEA then used this information, 
together with data captured by the software designed to standardize operations and tasks, to 
calculate sampling weights, population coverage, and school, teacher, and student participation 
rates. Chapter 7 of this report provides details about the weighting procedures. 


In asubsequent step, the ISC estimated CIL performance and CT scores as well as questionnaire 
indices for students, teachers, and schools (see Chapters 11 and 12 for scaling methods and 
procedures). On completing their verification of the sampling weights and scale scores, the ISC 
sent these derived variables to IEA for inclusion in the international database and for distribution 
to the national centers. 


Data cleaning quality control 


Because ICILS 2018 was a large and highly complex study with high standards for data quality, 
maintaining these standards required an extensive set of interrelated data checking and data 
cleaning procedures. To ensure all procedures were conducted in the correct sequence, that 
no special requirements were overlooked, and that the cleaning process was implemented 
independently of the persons in charge, the data quality control included the following steps: 


e Thorough testing of all data cleaning programs: Before applying the programs to real datasets, IEA 
applied them to simulation datasets containing all possible problems and inconsistencies. 


126 


EA compared nati 


ICILS 2018 TECHNICAL REPORT 


Registering all incoming data and documents ina specific database: |EA recorded the date of arrival 


as well as specit 


Carrying out data cleaning acco 
d the scope fo 


not possible, an 


Documenting al 


ICILS 2018 general cleaning documentation 


Logging every “m 
which only occ 


sys 


fic issues requi 


anual” correction to a country’s 
urred occasion 
cleaning process at any later stage of the da 


ring attention. 


r involuntary c 


tematic data recodings tha 


ally, allowed | 


rding to strict rules: 
hanges to the cleaning procedures was minimal. 


Deviations from the 


t applied to all countries: | 
for the main survey. 


ta-cleaning process. 


cleaning sequence were 


EA recorded these in the 


data files ina recoding script: Logging these changes, 
EA to undo changes or to redo the whole manual 


Repeating, on completion of data cleaning for a country, all cleaning steps from the beginning: This 


step allowed | 
the data-cleani 


Working closely 
centers with th 


center staff could thoroughly review and correct any iden 


t 


from the internati 


he structure of t 


EA to detect any problems tha 


ng process. 


onal adaptations recorded in the documentation for the national d 
he submitted national data files. IEA then 
onal data structure in the nationa 
user guide for the international database (Mikheeva and Meyer, 2020). Wheneve 
recoded national deviations to ensure consistency wi 
if international comparability could not be guaranteed, | 


the international database. 


Preparing national data files for analysis 


might h 


with national centers at various steps of the cleaning process: | 
e processed data files and accompanying documentation and sta 


tified incon 


adaptation databa 


th the international 


ave been inadverten 


recorded any identit 


EAremoved the correspon 


tly introduced during 


EA provided national 
tistics so that 
sistencies. 


atasets against 
fed deviations 
e ICILS 2018 
possible, IEA 
data structure. However, 
ding data from 


se and in th 


The main objective of the data cleaning process was to ensure that the data adhered to international 
formats, that school, teacher, and student information could be linked across different survey data 
files, and that the data reflected the information collected within each country in an accurate and 
consistent manner. 


The program based data cleaning consisted of the following activities (summarized in Figure 


10.1 and explained in the following subsections). | 


communication with the national centers. 


Checking documentation, import, and structure 


EA carried out all of these activities in close 


For each country, data cleaning began with an exploratory review of its data file structures and 


data documentation (i.e., national ad 
survey activities questionnaire). 


EA bega 


i 
S 


coordi 


qu 


nthe | 


in 
na 


urvey 


ery lan 


untries made ad 


e first checks ident 


n data cleaning by combini 
EA WinW3S database with the stude 
strument da 
tor, and teacher questionn 
step also saw data from the differen 
guage (SQ 
processing stages. 


Th 
co 
va 


ap 


aptation for 


ng the tracki 
nt-le 
ta. During this step, IEA s 
aires for bo 


database so that this in 


ms, student tracking form 


ng information and samp 


taff also retrieved the da 


th the online and paper ad 
t sources being transformed and imported i 


formation would be avai 


if 
tations (such as adding national variables or omitting or mod 
riables) to their questionnaires. The extent and nature of such changes differed acros 
some countries administered the questionnaires without any modifications (apart from 


fied differences between 


the international and nati 


vel database containing the corresponding student 


s, teacher tracking forms, 


ing information captured 


ta from the principal, ICT 
ministration modes. This 
nto one structured 
uring further data 


able d 


e structures. Some 
ifying international 
s countries: 
translations 


onalfi 


DATA MANAGEMENT AND CREATION OF THE ICILS 2018 DATABASE 127 


and necessary adaptations relating to culture or language-specific terms), whereas other countries 
inserted response categories within existing international variables or added national variables. 


To keep track of adaptations, IEA asked the national centers to complete national adaptation forms 
while they were adapting the international codebooks. Where necessary, IEA modified the structure 
and values of the national data files to ensure that the resulting data remained comparable across 
countries. Details about country-specific adaptations to the international instruments can be found 
in Appendix 2 of the ICILS 2018 user guide for the international database (Mikheeva and Meyer, 2020). 


At this stage, IEA discarded variables created purely for verification purposes during data entry, and 
made provision for new variables necessary for analysis and reporting. These included reporting 
variables, derived variables, sampling weights, and scale scores. 


Once IEA had ensured that each data file matched the international format, they applied a series 
of standard data cleaning rules for further processing. Processing during this step employed 
software developed by IEA that could identify and correct inconsistencies in the data. Each 
potential problem flagged at this stage was identified by a unique problem number, described 
and recorded in a database alongside the specific action taken by the cleaning program or IEA in 
relation to the problem. 


|EAreferred problems that could not be rectified automatically to the responsible NRC so that the 
national centers could check the original data collection instruments and tracking forms to trace 
the source of these errors. Wherever possible, IEA suggested a remedy and asked the national 
centers to either accept or propose an alternative. If anational center could not resolve problems 
through verification of the instruments or forms, IEA applied a general cleaning rule to the files 
to rectify this error. When all automatic updates had been applied, IEA ran individual recodings in 
the data to directly apply any remaining corrections to the data flles. 


Cleaning identification (ID) variables 
Eachrecord ina data file needs to have a unique |D number. The existence of records with duplicate 
D numbers ina file implies an error of some kind. If two records in an|ICILS 2018 database shared 
the same 1D number and contained exactly the same data, IEA deleted one of the records and kept 
the other one in the database. If both records contained different data and IEA found it impossible 
to identify which record contained the “true data,’ they removed both records from the database. 
EA tried to keep such losses to a minimum; actual deletions were very rare. 


Although the ID cleaning covered all data from all instruments, it focused mainly on the student 
file. This step in data cleaning included the preparation of the student test records provided to 
EA by RM Results. Due to the administration of the student test on USB sticks, data uploaded 
fter test sessions often contained several student records within a country with the same student 
D. In most of the cases, such records were duplicates. In extreme cases, students from an entire 
school had the same student ID. 


a 


The possible sources of multiple records were tracked back to the test administration procedures 
at schools or technical constraints of the student test delivery software. Depending on the nature 
of a multiple session, the records were used for processing, deleted, re-identified, or merged. In 
addition to checking the unique student ID number, it was crucial to check variables pertaining to 
student participation and exclusion status, as well as students’ dates of birth and dates of testing 
in order to calculate student age at the time of testing. The student tracking forms provided an 
important tool for resolving anomalies in the database. The records were cleaned by IEA with 
confirmation from national centers for individual records. Further details on cleaning of multiple 
records are reflected in Figure 10.1 below. 


128 


Figure 10.1: Overview of data processing at IEA 


ICILS 2018 TECHNICAL REPORT 


IEA WinW3S Online data 
IEA DME data 
& codebooks 
Documentation 
+ INPUT 

| ------eeeeee > Structure check 
oO Il 
oO 
~ 
e n 
e+ ------2 ee > ID cleaning o 
fe fe 
he oO 
re n 
2 t ec 
oO a 
= [= 
— tt : . rs 
ht > Linkage cleaning <x 
2 uu 
2 a 
oO 
is J 

4------S > Background cleaning 

IEA J 
ee > OUTPUT 
Communication 
International Siailistties Reports Documentation 
database 


As mentioned earlier, | 


(see Chapter 7 for det 


Checking linkages 


InICILS 2018, data about students, their schools, and teachers appeared in anu 
data files at the respective levels. Correctly linking these records to provide meaningfu 
analysis and reporting was therefore vital. Linkage was implemented through 
numbering system as described in Chapter 8 (Table 8.1) of this report. Student | 
main student data file had to be matched correctly with those in the reliability scoring file. 


EA conducted all cleaning procedures inc 
centers. After national centers had cleaned the ID variables, the 
about student participation and exclusion were used by the | 
students’ participation rates, exclusion rates, adjudication flags, and student sampling weights 


ails). 


ose cooperation with the national 
clean databases with information 


EA sampling section to calculate 


mber of different 


data for 


a hierarchical ID 
D numbers in the 


Ensuring 


that teacher and student records linked to their corresponding schools was also important. 


DATA MANAGEMENT AND CREATION OF THE ICILS 2018 DATABASE 129 


Resolving inconsistencies in questionnaire data 
The amount of inconsistent and implausible responses in questionnaire data files varied 
considerably among countries and none were completely free of inconsistent responses. IEA 
determined the treatment of inconsistent responses on a question-by-question basis, using all 
available documentation to make an informed decision. IEA also checked all questionnaire data 
for consistency across the responses given. 


For example, Question 3 inthe principal questionnaire asked for the total school enrolment (number 
of boys and number of girls, respectively) in all grades, while Question 4 asked for the enrolment in 
the target grade only. Clearly, the number given as a response to Question 4 could not exceed the 
number provided in Question 3. Similarly, it was not possible for the sum of all full-time teachers 
and part-time teachers, as asked in Question 6, to equal zero. In another example, Question 7 of 
the ICT coordinator questionnaire asked for the total number of ICT devices in the school, and the 
umber of ICT devices available to students. The total number of |CT devices in the school could 
ot be smaller than the number available to students. 


IEA flagged inconsistencies of this kind and then asked the national centers to review. IEA recoded 
those cases that could not be corrected or where the response provided was not usable for analysis 
as “omitted. 


+ 


Filter questions, which appeared in some questionnaires, directed respondents to a particular sub- 
question or further section of the questionnaire. IEA applied the following cleaning rule to these 
filter questions and the dependent questions that followed: If the answer to the filter question 
is “no” or “not applicable, any responses to the dependent questions were recoded as “logically 
not applicable.” 


i. 


IEA also applied what is known as a split variable check to questions where the answer was coded 
into several variables. For example, Question 26 in the student questionnaire asked students: “At 
school, have you learned about the importance of the following topics?” Student responses were 
captured in a set of four variables, each one coded as “Yes” if the corresponding “Yes” option was 
checked and “No” if the “No” option was filled in. Occasionally, students checked the “Yes” boxes 
but left the “No” boxes unchecked. Because, in these cases, it was clear that the unchecked boxes 
actually meant “No,’ these responses were recoded accordingly, provided that the students had 
given affirmative responses in the other categories. 


Resolving inconsistent tracking and questionnaire information 
Two different sets of ICILS 2018 data indicated age and gender for both teachers and students. 
The first set was tracking information provided by the school coordinator or test administrator 
throughout the within-school sampling and test/questionnaire administration process. The second 
set comprised of actual responses given by individuals in the contextual questionnaires. In some 
cases, data across these two sets did not match and resolution was needed. 


If the information on gender or birth year and month was missing in the student questionnaire 
but the student participated, then this data was copied over from the tracking information to the 
questionnaire, if available. 


The teacher questionnaire did not ask teachers to provide birth year and month but rather to choose 
between five age ranges. Year of birth, which was indicated in the tracking forms, was then recoded 
into age groups and cross checked against the range indicated by the questionnaire responses. If 
gender and/or age range information was missing from the teacher questionnaire but the teacher 
participated, this data was copied over from the tracking information to the questionnaire. 


If discrepancies were found between tracking and questionnaire gender and age data, the 
questionnaire information (for both teachers and students) replaced the tracking information. 
However, for teacher birth year, tracking information was set to missing given that only an age 
range, not a specific year, was indicated. 


130 ICILS 2018 TECHNICAL REPORT 


Handling of missing data 

Two types of entries were possible during the |CILS 2018 data capture: valid data values and missing 
data values. Missing data can be assigned a value of “omitted or invalid,’ or “not administered” 
during data capture. With the exception of the “not reached” missing codes assigned at ACER, IEA 
applied additional missing codes to the data to facilitate further analyses. This process led to four 
distinct types of missing data in the international database: 


e Omitted or invalid: The respondent had achance to answer the question but did not do so, leaving 
the corresponding item or question blank. Alternatively, the response was non-interpretable 
or out of range. 


e Notadministered: This signified that the item or question was not administered to the respondent, 
which meant that the respondent could not read and answer the question. The not administered 
missing code was used for those student test variables that were not in the sets of modules 
administered to a student either deliberately (due to the rotation of modules) or, in very few 
cases, due to technical failure or incorrect translations. The missing code was also used for those 
records that were included in the international database but did not contain a single response 
to one of the assigned questionnaires. This situation applied to students who participated in 
the student test but did not answer the student questionnaire. It also applied to schools where 
only one of the principal or ICT coordinator questionnaires was returned with responses. In 
addition, the not administered code was also used for individual questionnaire items that were 

not administered in a national context because the country removed the corresponding question 

from the questionnaire or because the translation was incorrect. 


e Logically not applicable: The respondent answered a preceding filter question in away that made 
the following dependent questions not applicable. 


e Notreached (this applied only to the individual items of the student test): This code indicated those 
items where it was believed that the students did not reach because of a lack of time. “Not 
reached” codes were derived as follows: an item received this coding if astudent did not respond 
to any of the items following it within the same booklet? (i.e., the student did not complete any 
of the remaining test questions), if he or she did not respond to the item preceding it, and if he 
or she did not have sufficient time to finish a module in the booklet. 


Checking the interim data products 


Building the international database was an iterative process. Once IEA completed each major data 
processing step, it sent a new version of the data files to the national centers so that they could 
review their data and run their own separate checks to validate the new data file versions. This 
process implied that national centers received several versions of their data, and their data only, 
before release of the draft and final versions of the international databases. All interim data were 
made available in full to the ISC, whereas each participating country received only its own data. 


EAsent the first version of data and accompanying documentation to national centers on October 
30, 2018. At this time, data for all countries who administered ICILS in the first half of 2018 were 
sent out. The data for countries who administered ICILS in the second half of 2018 received their 
data on January 16, 2019. This first version of each country dataset included the following data 
and documentation: 


e School-, student-, and teacher-level SPSS and SAS data files; 


e Univariate descriptive statistics for all variables in the data files; 


e Acleaning report that included a list of structural and case-level findings; 


e Arecoding documentation for country-specific data edits applied by IEA; and 


1. The term booklet is used here in reference to any possible combination of test modules. 


DATA MANAGEMENT AND CREATION OF THE ICILS 2018 DATABASE 131 


e Cleaning documentation describing the initial cleaning procedures undertaken at IEA and 
describing the data filles and statistics provided. 


IEA provided the ISC with subsequent versions of the data and related documentation as soon 
as it had implemented feedback from national centers. These additional versions of the data files 
were accompanied with the sampling weights and international achievement scores as soon as 
these became available. During this stage of the data-processing process, IEA asked countries to 
review the documentation on adaptations to the national versions of their instruments and the 
related edits applied to the data files. 


In May 2019 all national centers received the data from all other ICILS 2018 countries. This 
data version is called the draft international database since it roughly reflects the structure of 
the released databases. Most prominently and compared to earlier data send-outs this version 
included sampling weights and scales. All persons within the national centers needed to sign a 
confidentiality agreement assuring no sharing of any ICILS 2018 data products or information 
with respect to the results. 


During the fifth NRC meeting in Jyvaskyla, Finland in June 2019, NRCs had the opportunity to 
raise any further issues concerning their data that had not yet been raised. 


In August 2019, IEA provided NRCs with an updated version of the draft international database. 
This version was necessary due to minor edits IEA and ACER were made aware of and which 
resulted from internal quality control procedures. The ISC used this version of the data to produce 
the updated, final tables for the international report. 


The ICILS 2018 international database 


The ICILS 2018 international database incorporated all national data files from participating 
countries. The data processing and validation at the international level helped to ensure that: 


e Information coded in each variable was internationally comparable; 
e National adaptations were reflected appropriately in all variables; 


¢ Questions that were not internationally comparable were removed from the database; 


e All entries in the database could be linked to the appropriate respondent—student, teacher, 
principal, or |CT coordinator; 


e Only those records considered as participating (following adjudication) remained in the 
international database files; 


e Sampling weights and student achievement scores were available for international comparisons; 
and 


e Indirect identification of individuals was prevented by applying confidentiality measures, such 
as scrambling ID variables or removing some of the personal data variables that were needed 
only during field operations and data processing. 


More information about the ICILS 2018 international database is provided in the ICILS 2018 user 
guide for the international database (Mikheeva and Meyer, 2020). 


132 


ICILS 2018 TECHNICAL REPORT 


Summary 


To achieve a high-quality database, ICILS 2018 implemented a series of data management 
procedures that included checks to ensure the consistency of national database structures, proper 
documentation of all national adaptations, and ensure the comparability of international variables 
across national datasets. IEA reviewed all national databases in cooperation with national centers 
and the larger international team. The review process followed a series of thorough checking 
procedures, which led to the creation of the final ICILS 2018 database. The final data products 
included item statistics, national data files, and the international database accompanied by a user 
guide and supplementary information. 


References 


Mikheeva, E., & Meyer, S. (Eds.). (2020). IEA International Computer and Information Literacy Study 2018 
user guide for the international database. Amsterdam, the Netherlands: International Association for the 
Evaluation of Educational Achievement (IEA). https://www.iea.nl/publications/user-guides/icils-2018- 
user-guide-international-database 


CHAPTER 11: 


Scaling procedures for ICILS 2018 test 
items 


Louise Ockwell, Alex Daraganov, and Wolfram Schulz 


Introduction 


This chapter describes the procedures used to analyze and scale the ICILS 2018 test items that 
were administered to measure students’ computer and information literacy (CIL) and computational 
thinking (CT). It covers the following topics: 


e The scaling model used to analyze and scale the test items; 


e Test coverage; 
e |tem dimensionality and local dependence; 
e Assessment of item fit; 


e Assessment of scorer reliabilities for open-ended items; 


e Differential item functioning by gender; 


e Review of cross-national measurement equivalence; 


e International item adjudication; 
e International item calibration and test reliability; 

e International ability estimates (plausible values and weighted likelihood estimates); and 
e Estimation of changes in students’ CIL between 2013 and 2018. 


The development of the CIL and CT test items is described in Chapter 2 and was guided by the 
CILS 2018 assessment framework (see Fraillon et al. 2019). 


The scaling model 
tem response theory (IRT) scaling methodology was used to scale the test items. 
For dichotomous items, we used the one-parameter (Rasch) model (Rasch 1960), which models 
the probability of selecting category 1 instead of O as: 
exp(6,- 6;) 
1+exp(6,- 6) 


where P. (6,) is the probability for person n to score 1 on item i, 6, denotes the estimate of person 
n's location on the latent continuum (which in proficiency tests is commonly referred to as person 
ability) and 6,is the estimated location of item ion the same latent continuum (which in proficiency 
tests is commonly referred to as item difficulty). For each item, item responses are modeled as a 
function of the latent trait 6,. 


Inthe case of items with more than two categories (inthe ClLand CT assessments, items with more 
than one score point), this model can be generalized to the (Rasch) partial credit model (Masters 
and Wright 1997), which takes the form of: 

EXP 2 6 (0,- 8;+7;) 


YP, EXP D319 (8,- 8,447) 


h=0 


Py, (0,,) = x, = 0,1,....; 

where Px; (6,,) denotes the probability of personn scoring x on item i, 8, denotes the estimate of the 
person n’s location on the latent continuum, the item parameter 6; denotes the estimated average 
location (across the categories) of the item on the latent continuum, and 1 provides an additional 
step parameter that denotes the distance between estimates of each category boundary and the 
estimated location of the item on the latent continuum. ACER ConQuest, Version 4.0 software 
(Adam et al. 2015) was used to scale the CIL and CT test items for ICILS 2018. 


134 ICILS 2018 TECHNICAL REPORT 


Test coverage and item dimensionality 


When measuring cognitive abilities, it is important to use test items that cover the range of 
achievement found in the target population. First, we estimated the distribution of CIL and CT 
among ICILS 2018 students and the location of the corresponding item thresholds (with aresponse 
probability, rp = 0.51; see Figures 11.1 and 11.2). Item thresholds were equal to item difficulties of 
dichotomous items. For partial credit items, a difficulty threshold was estimated for each score.” 


Figure 11.1: Mapping of CIL student abilities and item difficulties 


39 


34 


M75 79 
Xx! 82 
X!112 45.2 49 58 
2 XXX! 32 
XXXXXX1 57 
XXXXXXK! 21 47 
XXXXXXKKKK, 29.2 81 
XXXXKXXKK! 36 
XXXXXXKXXXKXXOOK! 10.2 52 61 
XXXXXXKXXXXXKKXXXANK! 15 55.2 
XXXXXXXXXKXXXXXXKE 4 44.2. 78 80.2 
1 XXXXXXXXXKXKXXXXXKXXKXXXXXKXKX1 19 24.2 28.2 29.1 43.2 66 70 
XXOOKKXXXXXXXHKXXAXXKAXKXAXXAX 22 44.1 76 
XXXKXXXKKKKKKXKXKXKXAXAKXAKXKXAXXKXAX! 17 31 64.3 67 68 
XXXKXXXKXXKXKKXXKXXHKAXXXKK! 3 27.2 30.2 35 
XXXXXXXOOOOOOOOHKXXKXKKXXAXXKXXXAKXXAXK! 16 25.2 38 42.2 45.1 46.2 56 63 
XXHKKXOOOOOOOOOKXXXKKHKXXKMKKMKXXXKKKNIOOOONK S42 -42.1-«59-74— 80.1 


XXXKXXOOOOOOOOKXKXXKXKXXXXKAXXKAXXK 13.2 24.1 43.1 54 
ie} XXXKXXOOWMOKMKXMKXAXKXXXXXXXAAXXK 26.2 48 55.1 58 60 64.2 65 
XXXXXXOOWMOOOOOAKKKKKAXXXAKKKMKK | 12.2 14 27.1 30.1 40 51 
XXXXXXOOOOOKKKXXXXXXAXXXAK! 13.1 18 64.1 77 
XXXKKKOOOOOOOOOOKXKXXK | 10.1 46.1 69 
XXXKXOOOOOOOHHKXXXAXKKKX 1 2-9 12.4 25.1 
XXXXXXOOOOOOOXXXXXXKKXK | 37-50 
XOOOKKXXXAXXAAXX | 26.1 73 
OOOO XXXXXXKXK | 71.1 71.2 72 


1 This means that a respondent with the same score on the latent continuum as the item location parameter has a 

probability of 50 percent to give a correct response. 

2 This “Thurstonian’ threshold indicates for each item score the point on the location, where a respondent with the same 
latent trait score has a probability of 50 percent to obtain this item score or higher 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 135 


Figure 11.2: Mapping of CT student abilities and item difficulties 


XXXXXXXXXXXXXKXXXXAXXX 7.1 13.3 16.2 
XXXXXXXXXXXXXXKAXXXXXXXXXK, 4 5.2 15.2 16.1 
KXKKXX XK KKKKKKKX KKK KKK 
XXXXXXXXXXXXXXXXXAXXAXXAXXAXX1 6.2 9.3 14.2 
XXXKXXOWOOOOOOOH KX XXKXKKXXKXXKKKAKKXXXXKKX | 
XXXXXK KKK K KKK KKK KK KKK KK KKK MKKKAN | 
XXOOOK OOOO AKA KKK | 12 81 13.2 


XXXKXXXOOOOOOOK KK XXXKXKXAKKXAXXXAXK | 5.2 15.1 


KXXKXXOOOOOOHOOKXKXKXXXKXKXAXKAKXAXKA 1.1 3.2 14.1 
XXXKXKKKOOMHKKKXKXKXKXKXKXKXKKX | 
XXXKKXOOOOOOOOAXXXXXXXKAXK ! 2.2. 3.1 
XXXKXXXOOHMKKKXXKXKXKXHKN | 
1 YOOOOKXXXXXXXXXXXKX | 11.2 
XOOOOOOOOKXXXXXKKKX! 9.2. 12.2 13.1 
XXXKXXKXXXXXXKXK 1 10.2 
YHOOOOKK KXXHKXOK | 2.1 
YXXXXXMXAXKK! 11.1 12.1 


For both ClLand CT assessments, the range of item difficulties broadly matched students’ abilities 
found inthe student population. However, it is important to acknowledge that the match between 
test item difficulty and student ability varied considerably across countries depending on the 
distribution of student achievement within each ICILS 2018 country (see Fraillon et al. 2020, p. 
75 & 103). 


136 


ICILS 2018 TECHNICAL REPORT 


Assessment of item fit 


Before reviewing the international scales in detail and more specific item statistics, we evaluated 
the model fit for individual items. One way to determine goodness of fit is by calculating a mean 
square statistic (Wright and Masters 1982). Reviewing this residual-based item fit provides an 
indication of the extent to which each item fits the item response model. However, there are no 
clear rules for acceptable item fit, and some statisticians recommend that analysts and researchers 
interpret residual-based statistics with caution (see, for example, Rost and von Davier 1994). 


To assess the assumptions of the IRT model, our review of item fit was based on a combination 
of assessments, such as: mean square statistics, the item-rest correlations, item characteristic 
curves, percentages of students in each response category, and the average ability of students 
in each response category. We also reviewed item characteristic curves (ICCs), which provide 
a graphical representation of the observed success of students on an item in comparison to the 
probability of success predicted by the model across the range of student abilities for each item, 
including dichotomous and partial credit items. 


While the theory and principles underpinning the evaluation of model fit for items relates to all 
items, there are slight differences in terms of assessing the psychometric characteristics of items 
depending on whether they are dichotomous or have more than two categories. In the following 
section we present examples of reviews of psychometric characteristics for each of these two 
item types. 


Example dichotomous item 

Figure 11.3 shows the ICC for item RO2Z, which is a dichotomously scored constructed response 
item. It can be observed that the curve fits the expected model very closely, which is also suggested 
by the weighted MNSQ of 0.97. The item (location) parameter of 1.40 indicates that this item is 
moderately difficult, and it was retained for the scaling of CIL items. 


Figure 11.3: Item characteristic curve by score for dichotomous item RO2Z 


Characteristic curve(s) by score 
Item 6.1 (RO2Z) 


Weighted MNSQ 0.97 


Probability 


Latent trait (logits) 


Delta(s): 1.40 


-@- Item6t:1 — Item 61 model probability category 2 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 137 


Example partial credit item 

The ICC for item BO8Z, an item with O, 1, or 2 score points, showed that, for scores of O and 2 
score points the curves depicting the observed item responses were not entirely consistent with 
those predicted by the Rasch partial credit model. Fewer of the students with lower levels of CIL 
received a score of O, and more received a score of 2 than predicted by the model. In addition, 
fewer of the students of higher ability had a score of 2, and more had a score of O than predicted. 
This indicates that the item discriminated not as well as expected. However, the ICC still indicated 
that students with higher levels of knowledge received higher scores for this item (Figure 11.4). 
The item was retained for the scaling of ClL items given that its discrimination was judged as still 
appropriate for the measurement of the latent trait. 


Figure 11.4: Item characteristic curve by category for item BO8Z 


Characteristic curve(s) by score 
Item 10 (BO8Z) 


Weighted MNSQ 1:08 
1 = 


0.9 5 
08 + 
0.7 
0.6 4 
0.5 5 


04-4 


Probability 


03 


Delta(s): -0.26 0.73 Latent trait (logits) 


--e- Item 10:0 — lItem 10 model probability category 1 
--@- Item 10:1 — lItem 10 model probability category 2 
--@- Item 10:2 — Item 10 model probability category 3 


We further analyzed the functioning of the constructed response scoring guides by reviewing the 
proportion of responses in each score (to confirm that each score was represented in the scoring) 
and confirming that the mean abilities of students achieving each score on an item (e.g., 0, 1, 2) 
discernably increased with the increase in scores (i.e., that the mean ability of students achieving 
ascore of 2 ona given item was higher than that of students achieving a score of 1 and that this 
was in turn higher than that of students receiving ascore of 0). This analysis confirmed that all the 
constructed-response items included in the final set of scaled scored items were of satisfactory 
psychometric quality. 


138 


ICILS 2018 TECHNICAL REPORT 


Item adjudication outcomes 
n total, three ClL items (HO4A, SO5Z, and GO5A) were not included in the calibration of items or 
scaling of students’ CIL score. HO4A and SO5Z had already been excluded from analysis in 2013, 
but were re-administered in 2018, while GO5A was newly developed for ICILS 2018. For these 
three items, preliminary analysis using 2018 data showed unsatisfactory item statistics and it was 
decided to exclude these items from further analysis. One further CIL item (SO7Z) was excluded 
from the scaling of ClL items at a later analysis stage due to unsatisfactory scaling properties. 


We determined the item-rest correlations, the correlation between the score of one item and the 
total rawscore derived from all other items assigned to each student, of correct responses (or partial 
credit responses) and the weighted item fit statistics (Table 11.1). Only five CIL items and none of 
the CT items had an item-rest correlation below 0.2 (which indicates a rather low discrimination). 
We found unsatisfactory residual-based item fit statistics for only one ClLitem and three CT items. 


Table 11.1 and 11.2 show the item-rest correlations of correct responses to multiple-choice items 
or scored partial credit items, and the weighted item fit statistics for ClLand CT items, respectively. 
Information about item SO7Z is still included in this table even though the item was later removed 
from scaling. 


For CIL items, the item-rest correlations ranged from 0.11 to 0.63, while CT items had values 
ranging from 0.24 to 0.70. Item-rest correlations of 0.20 or lower were usually flagged for further 
review (six items). Two of the ClL items (GO4Z and SO6Z) were included in the scaling of ClL items 
even though we found weighted MNSQ statistics of around 1.20 or higher suggesting somewhat 
less satisfactory item fit to the model, denoted by less discrimination between high and low 
performing students than predicted by the model. Similar observations were made regarding the 
CT items TAO5Z and TAQ7Z that also showed relatively poor fit with weighted MNSQ above 1.20. 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 139 
Table 11.1: International CIL item-rest correlations and weighted item fit 
Item no. Item name Item-rest Weighted fit Item no. Item name Item-rest Weighted fit 
correlation correlation 
1 BO1Z 0.31 1.11 42 SO8C 0.56 1.00 
2 BO2Z 0.38 0.99 43 SOsD 0.53 0.99 
3 BO3Z 0.35 0.98 44 SO8E 0.54 0.88 
4 BO4Z 0.25 1.08 45 SO8F 0.55 0.83 
5 BO5Z 0.36 1.02 46 SO8G 0.50 1.02 
6 BO6Z 0.28 1.03 47 G01Z 0.21 .10 
7 BO7A 0.23 1.14 48 G02Z 0.27 .10 
8 BO7B 0.30 0.94 49 GO3Z 0.11 0.97 
9 BO7C 0.36 1.06 50 G04Z 0.20 Ay 
10 BO8Z 0.44 1.12 Bal GO5B 0.42 0.97 
11 BO9A 0.58 0.82 52 GO06Z 0.25 1.08 
12 BO9B 0.63 0.89 53 GO7Z 0.31 1.13 
13 BO9C 0.61 0.93 54 GO8Z 0.25 1.11 
14 BO9D 0.49 0.90 55 GO9C 0.43 1.02 
15 BO9E 0.43 1.00 56 GO9D 0.44 0.95 
16 BO9F 0.60 0.83 57 GO9E 0.39 0.92 
17 BO9G 0.55 0.86 58 GO9F 0.36 0.87 
8 01Z 0.35 1.04 59 GO9G 0.44 0.96 
19 02Z 0.38 1.03 60 RO1Z 0.48 0.93 
20 03Z 0.42 0.91 61 RO2Z 0.35 1.00 
21 05Z 0.31 1.08 62 RO3Z 0.34 1.06 
22 06Z 0.39 0.99 63 RO4Z 0.38 1.01 
23 O7A 0.55 0.93 64 ROSA 0.58 1.17 
24 07B 0.58 0.89 65 RO6A 0.34 1.06 
25 07C 0.60 0.94 66 RO6B 0.34 1.01 
26 07D 0.49 1417 67 RO7Z 0.41 1.03 
27 O7E 0.43 1:15 68 RO8Z 0.37 1.02 
28 O7F 0.50 0.92 69 RO9Z 0.36 1.04 
29 07G 0.40 0.99 70 R10Z 0.25 1.04 
30 07H 0.57 0.97 72 R11A 0.53 1.04 
31 O7| 0.48 0.88 72 R11B 0.51 0.86 
32 O7J 0.29 0.90 73 R11C 0.53 0.84 
33 SO1Z 0.23 07 74 R11D 0.46 0.93 
34 SO2Z 0.23 12 Le R12A 0.19 1.09 
35 SO3Z 0.21 16 76 R12B 0.37 1.00 
36 SO4A 0.29 04 77 R12C 0.44 0.94 
37 S04B 0.34 1.02 78 R12D 0.37 1.01 
38 SO6Z 0.15 1.21 79 R12G 0.26 1.03 
39 SO7Z 0.15 1.16 80 R12H 0.58 0.98 
40 SO8A 0.52 0.87 81 R12! 0.48 0.94 
4 SO8B 0.59 0.82 82 R125 0.30 0.87 


140 


ICILS 2018 TECHNICAL REPORT 


Table 11.2: International CT item-rest correlations and weighted item fit 


Item no. Item name Item-rest correlation Weighted fit 

TAO1Z 0.46 1,12 
2 TAO2Z 0.4 13 
3 TAO3Z 0. 18 
4 TAO4Z 0.24 09 
5 TAO5Z 0.37 21 
6 TAO6Z 0.54 0.98 
7 TAO7Z 0.42 27 
8 TAO8Z 0.34 12 
9 TFO1E 0.54 0.96 
0 TFO2L 0.48 0.86 
11 TFO3TEC 0.65 0.84 
12 TFOATEC 0.68 0.80 
13 TFOSTE 0.70 0.84 
14 FO6TE 0.69 0.80 
15 FO7TE 0.67 0.83 
16 FO8TEC 0.54 0.91 
17 FO9 0.44 0.88 


Dimensionality and local dependence 


We reviewed test dimen 


the ACER ConQuest software package (Adams 


ClL was best described 


dimensions were very h 


similar result and only t 


using a single scale give 


he overall CT scale was 


An analysis of local dependence was conduc 
combinations of items (Adams and Wu 2009), similar to the fit statistics that are estimated for 


each item parameter estin 
absence of local dependence between items. A fit s 


of local dependence. 


The results from the ma 


local dependence was h 
nested within large tas 
remained for single tas 
structure of the assess 
similarly high levels of 


to the IEA Progress in | 


igh (see Gebhardt and S 


nate. The expected fit s 


ted 


eta 


sionality for CIL with multidimensional IRT mode 
. 2015). The results suggested that students’ 
n that latent correlations 
chulz 2015). Analyses of t 
reported in ICILS 2018. 


by estimating fit stat 


insurvey are presented in Figure 11.5. Fit statistic 
(before any action to resolve local dependence was taken) were between 1.5 and 1.9. In all modules, 


s, fit 
s in four out of five mo 


ighest for items nested within the la 
values above 1.3 indi 


du 


ing in ICILS 2013 using 


between potential sub- 
he CT items provided a 


istics for user-defined 


tatistics for groups of items was 1, suggesting 
tatistic of 1.3 or higher was flagged as indicative 


s for complete modules 


rge tasks. Although less than for items 


cated that a certain level of local dependence 


es. This finding was no 


related to common reading passages (Quittre and Monseur 2010). 


Analyses to review loca 


dependence were also undertaken 
single tasks or the large task of each module. The purpose of 


t unexpected given the 


ment with many items corresponding to common themes or tasks, and 
ocal dependence had already been identified for 
ICILS 2013 (Gebhardt and Schulz 2015). Similar observati 


the CIL assessment in 


ons have also been made with regard 
nternational Reading Literacy Study (PIRLS), where sets of test items are 


between pairs of ClLitems within the 
this review was to identify pairs of items 


with high local dependence in order to callapse those items into one item (if justifiable according 


to the item content) or, where necessary, to remove one o 


f the two items from the scale. Eleven 


such pairs were found (with a fit mean square statistic > 1.3). Table 11.3 lists these item pairs and 
the action taken to resolve the local dependence. Comparison of the fit statistics before and after 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 


Figure 11.5: Local dependence within CIL modules 


2.27 


141 


Band competition 


ie Complete modules (original) 
[| Large tasks (original) 
ee Single tasks (original) 


Breathing School trip 


Board games Recycling 


Complete modules (cleaned) 
Large tasks (cleaned) 


Single tasks (cleaned) 


removing items with unsatisfactory psychometric characteristics showed improvement but local 
dependence was still evident within four out of five complete modules. When considering only 
large task items, local dependence was still evident within all five modules, but only within two of 
five modules when considering only single task items. 


Table 11.3: CIL item pairs showing local dependence 


Module Task type Item 1 Item 2 MNSQ T value Action taken 
G arge GO9IC GO9ID 1.36 29 GO9YC and GOIG combined 
G arge GO9IC GO9IG 1.56 42 GO9YC and GOIG combined 
arge O7A 07D 1.42 33 07D removed 
arge O7C 07D 1.40 31 07D removed, HO7B, and HO7C combined 
arge 07D O7F 1.33 25 07D removed 
arge 07D O7G 1.33 25 07D removed 
arge O7H HO7| 1.42 eee 07H and HO7I combined 
R single R11A R11B 1.36 24 R11A,R11B, R11C, and R11D combined 
R single R11A R11C 1.43 28 R11A,R11B, R11C, and R11D combined 
R large R12B R12C 1.32 28 R12B and R12C combined 
S single SO4A SO4B 1.37 31 SO4A and SO4B combined 


Assessment of scorer reliabilities 
The scoring of constructed-response items in the ICILS cognitive test was guided by the scoring 
guides that were developed for new ICILS 2018 items, then refined following the experiences 
in the international field trial, or adapted from ICILS 2013 for those items that were included to 


measure changes over ti 


me. Within countries, subsamples of about 20 percent of student responses 


to each task were scored twice following a controlled, random allocation to different scorers. The 


assignment of item responses to scorers was implemented an 


scoring systems (see Chapter 3). This double-scoring procedure allowed for the assessment of 


scorer reliabilities. 


d controlled as part of the online 


Table 11.4 shows the percentages of scorer agreement for ClL and CT items, based on 13 countries 
and benchmarking entities for CIL and eight countries and benchmarking entities for CT items. 
Percentage of agreement between scores on double-scored items ranged from 40 to 100 percent 
across countries. 


ICILS 2018 TECHNICAL REPORT 


142 


€8 G6 v6 Z8 G6 96 L6 86 G6 06 T6 88 98 c8 88 8T0Z STIDI Jeuolzeus9}U] 
6S 06 v € 8 C6 96 66 68 C6 S9 C9 99 8S ZL Aensnin) 
£8 86 68 Z8 L6 L6 86 86 S6 v8 68 98 C8 v8 co So}eys pour 
OOT OOT OOT v6 96 L6 66 66 66 S6 S8 SL €9 v9 eZ jesnyod 
96 C6 86 €6 66 86 86 66 86 L6 66 86 £6 v6 86 (UoHDIapa_{ UBISsNy) MOISO|W 
98 C6 66 06 L6 L6 96 66 96 06 v6 £6 Té6 68 OOT B.INOqWAXN] 
L6 66 OOT 86 OOT 66 66 OOT 66 L6 66 L6 86 86 OOT Jo aIIGNday ‘easo 
68 96 L6 v6 L6 S6 L6 66 S6 86 96 €6 06 16 96 UesSYAeZe 
68 L6 S6 v6 86 86 86 66 86 S6 86 86 S6é v6 C6 Aley 
9 €6 S6 TZ v6 86 96 66 68 18 v6 68 v8 8S 98 AueWas) 
8S 16 c8 €9 16 68 £8 L6 €6 SL c8 SL TZ eS c9 soues 
Té6 L6 68 €6 66 L6 66 66 L6 v6 G6 S6 V6 V6 V6 pue|ul- 
6L vé 86 8Z v6 06 v6 86 16 c8 v8 S8 S8 98 BZ JEWUS 
C6 vé €6 C6 S6é 96 66 86 96 T6 c6 €6 S6é v6 €6 alYD 
ZEOD | ZEOD | D80S | A80S | 4380S | G80S | D8OS | d80S | V80S | fZOH IZOn a ZOn ZO s RaZOnie eZ Asyunod, 
v8 c8 T8 96 86 96 06 €6 96 c6 68 c6 G6 G6 16 8TOZ STIDI [euolzeus9}U] 
9S SS eS 06 86 86 98 68 8 6L£ 18 68 C6 C6 8Z Aensnin 
S v v6 86 €6 88 c6 86 £8 C6 v6 L6 L6 06 Sa}eq$ podun 
98 ZL Sv 96 OOT OOT £8 v6 96 OOT 16 16 96 OOT OOT jesnqod 
Té6 68 06 86 L6 €6 Té6 88 L6 €6 Té6 v6 96 06 c6 (UoDIapa_ UBIssNy) MOISO|W 
C6 06 €6 OOT 86 €6 Z8 Té6 L6 L6 88 C6 S6 £6 06 B.1NOqWAXN] 
66 86 L6 96 66 L6 86 86 OOT 86 L6 L6 L6 66 66 Jo aIIGnday ‘easo 
c6 C6 €6 v6 86 96 L6 86 L6 96 v6 68 86 96 €6 UeISUAeZE 
68 06 €6 66 66 86 C6 v6 L6 88 S6 €6 L6 v6 c6 A\e} 
9 69 eZ S6 96 68 eZ v8 86 98 8Z C6 €6 v6 v8 Aueuwas) 
vS SS TS 16 S6 v6 08 88 C6 16 9L 98 6L 88 £8 SUE 
€6 c6 €6 86 86 66 C6 Té6 86 T6 G6 S6 S6 66 L6 pue|ul- 
£8 TL 08 96 66 L6 vl c8 L6 €8 08 88 06 £6 €8 JAE WUE 
S6 €6 C6 L6 86 96 L6 86 66 v6 €6 96 66 v6 C6 aIYD 
GZOH | DZOH | 9ZOH | VZOH | Z9OH | ZSOH | 9604 | A604 | 4608 | G60d | D608 | 4604 Vé60d | Zc0d | ZL0d AsyunoD 


SWI3}! 1S3}_ | D PUD T]D asuodsal-pajanzsuOd JOf JUBWIaIED 1340S Jo saB0USIUAd “TT alqvl 


143 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 


S8 88 G6 c6 16 L6 68 $8 98 v6 G6 8TOZ STD] Jeuolzeuss}u| 
V/N V/N S8 16 68 L6 £8 TZ c8 96 68 Aensnin 
£8 88 86 €6 v6 86 68 S6é c8 L6 66 Saqe}s pou 
OOT OOT L6 C6 €6 96 S6 €8 T8 L6 OOT jesnJod 
V/N V/N S6é S6é 96 L6 L6 68 C6 96 86 (UoNb.Japa{ UbISSNY) MOISO/| 
98 88 96 S6é 96 66 €6 98 68 66 96 sunoqwaxn 
96 L6 86 L6 96 86 86 86 86 66 OOT jo aI\qnday ‘easo 
V/N V/N 16 C6 16 €6 16 C6 C6 C6 66 UeSYAeZE 
V/N V/N 66 86 L6 OOT v6 38 S6 66 L6 Aley 
SZ (oe v6 €8 £8 v6 T8 Z8 EL v9 L6 AueWas) 
TZ 6L Z8 8Z v8 96 59 89 SL S6é OL aUeI4 
16 S6é 66 v6 L6 86 96 06 v6 86 66 pue|uls 
vl 18 96 68 ZL 96 88 L9 TZ S6é 66 JEWIUS 
V/N V/N 96 €6 16 L6 68 16 €6 66 96 al¥D 
Z80VL | ZZOVL | (clu Icbd | HCY | Deve | Acta | Oct acta | Weld | Atty AsyunoD 
C6 C6 66 68 68 96 Z8 06 £8 G6 G6 T6 v8 88 c6 8TOZ STD] Jeuolzeus9}U] 
66 €6 L6 v9 L9 C6 Ov c8 VS as) S8 cS €9 389 S8 Aensnin 
OOT 86 OOT €6 68 96 S8 88 v6 L6 86 L6 S8 98 88 Se}e}s polun 
L6 86 L6 OOT OOT 96 96 OOT v6 96 96 v6 68 OOT OOT |esnyJoq 
66 86 66 96 96 L6 v6 96 S6 16 L6 C6 06 68 96 (uobJapa{ UDISsNY) MOISOY 
S6é 06 86 88 68 v6 T6é €6 c6 S6 v6 €6 88 68 C6 B.INOqWAXN] 
66 L6 66 96 96 96 86 86 L6 L6 96 66 86 66 86 jo aI\Gnday ‘easo 
66 v6 66 v6 C6 L6 16 C6 88 L6 L6 €6 £8 v6 €6 UeESYACZE 
L6 C6 L6 06 C6 96 16 16 S6 66 86 96 €6 C6 v6 Ae} 
V/N 06 OOT T8 €8 C6 v8 08 €8 86 86 06 cL SL 98 AueWJas) 
L6 6L£ 86 6L ZL €6 L9 8Z T9 v8 16 68 T9 cL c8 39Ue 4 
66 v6 66 L6 S6é 66 v6 96 96 86 86 S6é Té6 v6 S6é pue|ul+ 
86 v6 66 98 58 56 18 v8 S8 86 €6 16 TZ €8 98 JEWS 
66 €6 OOT C6 €6 86 v6 06 C6 96 v6 €6 €6 16 v6 alYD 
OTTY | alte | VITY | ZOTY | Z80N | A90N | V9ON | ZZOYN | 960D | A6OD | A60D | 60D | D60D | Z80D | Z90D AayunoD 


(/p}U09) Sta}! 159} _|D PUB 7[D asuodsa.-pajon.}sUOd JOJ JUaWIBAIED 144095 Jo SaBOJUIDIA “TT alqol 


144 ICILS 2018 TECHNICAL REPORT 


As has been the practice in other IEA studies, only items scored with a minimum of 70 percent 
scorer agreement were included in the international database. While scorer agreement was 
above 70 percent for all constructed-response items for the pooled ICILS 2018 dataset, we also 
reviewed and adjudicated scorer agreement at the national level. There were 37 cases where 
the scoring of an open-ended response item had an agreement below 70 percent. While in eight 
out of 13 countries scorer agreement was above 70 percent for all constructed-response items, 
in some countries this criterion had not been met for several of these items. The scores for the 
corresponding items were excluded from the scaling of the corresponding national data when 
drawing plausible values but are included in the public database (see Appendix C). 


Differential item functioning by gender 


The analysis included an exploration of the quality of the items by assessing differential item 
functioning (DIF) by gender. DIF occurs when groups of students with the same degree of ability 
differ in their probabilities of responding correctly to an item. For example, if boys with the same 
degree of ability as girls have a higher probability of correctly answering an item than girls, the 
item shows DIF with regard to gender. This suggests a violation of the model’s assumptions, which 
assumes that the probability is exclusively a function of ability and not of any other characteristics 
of the respondents. 


Itis possible to derive estimates of gender DIF by including interaction terms in the item response 
model. To achieve this, gender DIF was modeled for dichotomous items as: 

exp(6,~ (8j - 1, trig) 
1+exp(6,- (8;- nz +Aj,)) 


P; (6p) 


Here, 6, is the estimated ability of person n and § is the estimated location of item i, an additional 
parameter for gender effects A. However, to obtain proper estimates, we also needed to include 
the overall gender effect (n,) in the model.? Both item-by-gender interaction estimates (Aig) and 
overall gender effects (n.) were constrained to have a sum of O. 


Gender DIF estimates for a partial credit model for items with more than two categories (here, 
constructed items) could similarly be modeled as: 
exp? (0,- (dj- Not ig + T;)) 


- x, = 0,1,...,.; 
nae EXD oe (6,- (3) ~ gt Vig + Ty) 


Py; (6, 


Here, 6, denotes the person's ability, 6; gives the item location parameter on the latent continuum, t; 
isthe step parameter, ,, is the item-by-gender interaction effect, and n, is the overall gender effect. 


Table 11.5 and 11.6 show the gender DIF estimates for CIL and CT items. Estimates above an 
absolute value of 0.3 logits were flagged as indicating substantial amount of DIF. There were no 
items that showed indications of DIF that would have suggested a removal from scaling for either 
scale. 


3 The minus sign ensures that higher values of the gender effect parameters indicate higher levels of item endorsement 
in the gender group with higher value (here, females). 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 


Table 11.5: Gender DIF estimates for CIL test items 


Item no. Item name Gender DIF Item no. Item name Gender DIF 
estimate estimate 
1 BO1Z -0.04 42 SO8C 0.02 
2 BO2Z -0.15 43 SO8D -0.10 
3 BO3Z -0.11 44 SO8E 0.08 
4 BO4Z -0.03 45 SO8F 0.15 
5 BO5Z 0.04 46 SO8G 0.10 
6 BO6éZ -0.05 47 GO1Z -0.12 
7 BO7A -0.08 48 G02Z -0.19 
8 BO7B 0.04 49 GO3Z 0.01 
9 BO7C -0.04 50 G04Z -0.16 
10 BO8Z -0.10 51 GO5B -0.06 
11 BO9A 0.15 52 G06Z -0.02 
12 BO9B 0.08 53 GO7Z 0.09 
13 BO9C 0.01 54 GO8Z 0.09 
14 BO9D 0.20 55 GO9C -0.04 
15 BO9E 0.13 56 GO9D -0.0 
16 BO9F 0.07 57 GO9E 0.04 
17 BO9G 0.13 58 GO9F -0.09 
18 012 -0.14 59 GO9G -0.12 
19 O02Z -0.03 60 RO1Z -0.20 
20 03Z -0.08 61 RO2Z -0.05 
21 05Z -0.07 62 RO3Z 0.01 
22 06Z 0.02 63 RO4Z -0.12 
23 O7A 0.09 64 ROSA -0.11 
24 07B 0.06 65 RO6A -0.03 
25 07C 0.04 66 RO6éB 0.00 
26 07D -0.03 67 RO7Z 0.14 
27 O7E 0.02 68 RO8Z 0.06 
28 O7F 0.00 69 RO9Z 0.00 
29 O07G -0.03 70 R10Z 0.13 
30 O7H 0.04 71 R11A 0.06 
31 O7| -0.01 72 R11B 0.04 
32 O7J 0.03 73 R11C 0.08 
33 S01Z 0.00 74 R11D 0.03 
34 S02Z 0.00 75 R12A -0.05 
35 S03Z -0.01 76 R12B -0.04 
36 SO4A -0.04 77 R12C -0.05 
37 SO04B -0.05 78 R12D 0.02 
38 S06Z -0.10 79 R12G 0.00 
39 S07Z -0.17 80 R12H 0.01 
40 SO8A 0.12 81 R121 -0.04 
41 SO8B 0.14 82 R12) -0.08 


145 


146 ICILS 2018 TECHNICAL REPORT 


Table 11.6: Gender DIF estimates for CT test items 


Item no. Item name Gender DIF estimate 

TAO1Z -0.02 
2 TAO2Z 0.09 
3 TAO3Z 0.12 
4 TAO4Z -0.04 
5 TAO5Z 0.00 
6 TAO6Z -0.03 
7 TAO7Z 0.07 
8 TAO8Z -0.03 
9 TFO1E 0.07 
0 TFO2L 0.00 
1 FO3TE 0.06 
2 FOATEC 0.00 
3 FOSTEC 0.01 
4 FO6TEC 0.03 
5 FO7TEC -0.09 
6 FO8TE -0.14 
p FO9 -0.14 


National reports with item statistics 


ational centers were provided with item statistics (see example for BO1Z in Figure 11.6) and 
equested to review any flags for the respective test items. Flags included cases of unusual 
orrelation (e.g., negative correlations between correct response and overall score) and those 
howing large differences between national and international item difficulties. They also included 
pen-ended items where the category-total correlations were disordered. In some cases, national 
enters informed the international study center of translation, scoring, or technical problems 
hat had not been detected during verification. In these cases, we categorized the items as “not 
dministered’” in the international database and excluded them from scaling of the corresponding 
ational data. 


Qa na 


own 


> Oo ee © 


Working independently from the item reviews by national centers, international study center staff 
flagged national items that showed poor scaling properties (such as item misfit or large item-by- 
country interactions) and conducted post-verifications of item translation. In one instance, we 
identified a national item that needed to be set to “not administered” in the international database 
and was consequently also excluded from the scaling of the corresponding national data. 


Appendix C provides details about items that were excluded from scaling or deleted from the 
database. 


Cross-national measurement equivalence 


With any test used to assess student achievement cross-nationally, it is important that the test 
items function similarly across those countries. Similar to the case of DIF by gender (see above), 
items show item-by-country interaction when students from different countries but with the 
same ability vary in their probability of answering these questions correctly. Test items with 
considerable item-by-country interaction are not suitable for the scaling of cognitive test items 
in international surveys. 


For the main survey analyses of test items, national item parameter calibrations were compared 
with international item parameters in order to assess the occurrence of item-by-country interaction. 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 147 


Figure 11.6: Example of item statistics provided to national centres 


BO1Z: Band competition - 01 (Item Format: constructed response auto-coded) 


Number of cases: 1153 


Adjusted correlation: 0.18 


Item threshold(s); -1.257 


Fit (weighted MNSQ): 1.13 Item delta(s) -1.257 
Response 0 1 9 
Score O 1 O 
Students 223 896 34 
% NAT 19.34 7771 2.95 
% INT 33.9 60.35 5.75 
Ability average -0.37 0.061 -1.48 
Ability SC 1.084 0.968 0.975 
Pt Bis -0.1 0.19 -0.23 
27 Average ability by category 
1- 
0 PES | —_$$__es 
ic. | Pos 
-2 I 
27 Point biserial by category 
{I | 
7 = | ee 
=] 
-2 an | 


Fit Adjusted correlation 


0.70 1.00 1.30 (value) 0.00 0.40 0.80 (value) 


International value x 1.09 x 0.323 
Aggregated statistics LH | | 
National value xX ide xX | | 0.184 
Delta (item difficulty) Item-category threshold 
-2.0 0.0 2.0 (value)  -2.0 0.0 2.0 (value) 

International value | x -0.969 x -0.969 
Aggregated statistics | | 1] | | | na 
National value oe 12570 EK | -1.257 

Item by country interaction Adjusted correlation Fit 

No of Easier than | Harder than | Non-key PB Key PB Low adjusted | Ability not Small Large 

countries expected expected is positive is negative | correpation ordered (high discr.) (low discr.) 

BO1Z [ v| 
Countries 13. 4 4 0 0 2 0 0 O 


148 


ICILS 2018 TECHNICAL REPORT 


Confidence intervals were computed for each national item parameter, basing the computation 
on the respective standard errors and adjusting them for possible design effects and for multiple 


comparisons. 


As an example, Figure 11.6 shows the item-by-country interaction graph for CIL item HO7E. The 
figure shows clear and considerable variation in the relative difficulty of the item across countries. 


Similar graphs produced for each test item were used 
international and national levels, while information a 
used to identify items for post-verification checks aft 


inthe test-item adjudication process at the 
bout occurrence of cross-national DIF was 
er completion of the main data collection. 


Figure 11.7: Example of item-by-country interaction graph for item HO7E 


60 
5.0 
407 
3.0 4 
20-5 


104 
oof = _ fa Oech ust al a 
41 Z a 

-20 L 


-3.0 4 


Item location (in logits) 


40-4 


5.0 


-6.0 


CHL 
DEU 
DNK 
FIN 
FRA 
ITA 
KAZ 


Although the ICILS test items showed generally only 
were some national item difficulties that deviated qui 


KOR 
LUX 
PRT 

RMO 

URY 

USA 


limited item-by-country interactions, there 
te considerably (more than 1.3 logits) from 


the international item difficulty. In these cases (see, for example, Italy in Figure 11.7), we omitted 
the items from scaling. Appendix C includes the complete list of items that were omitted nationally 


from scaling because of substantial item-by-country i 


nteraction. 


Evaluating the impact of missing responses 


There were two possible types of missing responses in the ICILS test. These were “omitted” items 


(coded in the database as 9) and “not-administered” 
category was used when a student provided no respo 


items (coded as 8). The omitted-response 
nse at all to an item administered to him or 


her. Not-administered items were those that, although in the whole item pool, were not in the sets 
of modules (two out of four) administered to a student. Not-administered items occurred either 
by design (due to the assignment and rotation of modules) or, in the few cases described earlier in 


this chapter, due to technical failure, incorrect transla 


tions, or scaling properties. 


A separate missing category called “not reached” (coded as 7) was created for analysis and 
subsequent scaling purposes at the post-processing stage. An item was assigned this code if the 


student concerned did not respond to the item immed 
to any of the items following this item within the same 


iately preceding it, and also did not respond 
module (i.e., did not continue on to the end 


of the test). The extent of occurrence of Code 7 items provided us with information about the 
eventual appropriateness of the test length as well as the appropriateness of its difficulty, following 


similar analysis at the field trial stage. 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 


Table 11.7 and 11.8 show the international percentages of 


Table 11.9 shows the average percentages of missing va 
average, each item was omitted by nearly eight percent of 
students that did not reach an item was about one percent. 


lues overall and for each module. On 


omitted and not reached responses. 


the students. The average number of 


Table 11.7: Percentages of omitted responses and items not reached due to lack of time for CIL items 


Item no. Itemname | Omitted Not Item no. Itemname | Omitted Not 
reached reached 

1 BO1Z 5.73 0.00 42 SO8C 3.33 1.94 
2 BO2Z 1.32 0.00 43 SO8D 3.33 1.94 
) BO3Z 25.45 0.02 44 SO8E 3.33 1.94 
4 BO4Z 7.05 0.04 45 SO8F 3.33 1.94 
5 BO5Z 135 0.06 46 SO8G 3.33 1.94 
6 BO6Z 7.92 0.10 47 GO1Z 241 0.00 
7 BO7A 1:35 0.25 48 GO2Z 3.46 0.00 
8 BO7B 1.36 0.25 49 GO03Z 5.64 0.04 
9 BO7C 1.35 0.25 50 GO4Z 1.14 0.06 
10 BO8Z 193 0.33 51. GO5B 3.36 0.10 
11 BO9A 4 0.95 52 GO6Z 9A3 0.15 
he BO9B 4 0.95 53 GO7Z 0.64 0.30 
13 BO9C 4.1 0.95 54 GO8Z 0.46 0.37 
14 BO9D 4.1 0.95 55 GO9C 25.94 0.64 
15 BO9E 4.1 0.95 56 GO9ID 25.94 0.64 
16 BO9F 4.1 0.95 57 GO9E 25.94 0.64 
17 BOIG 411 0.95 58 GO9F 25.94 0.64 
18 01Z 2.97 0.00 59 GOIG 25.94 0.64 
9 O02Z pills) 0.00 60 RO1Z 1.27 0.00 
20 O03Z 3.75 0.05 61 RO2Z 10.52 0.00 
21 O5Z 473 0.89 62 RO3Z 6.36 0.02 
22 06Z 2.78 1.40 63 RO4Z 9.04 0.02 
23 O7A 1.82 1.88 64 ROSA 3,35 0.04 
24 07B 1.82 1.88 65 RO6A 7.02 0.08 
25 O7C 1.82 1.88 66 RO6B 7.02 0.08 
26 07D 1.82 1.88 67 RO7Z 4.82 0.23 
27 O7E 1.82 1.88 68 RO8Z 11.68 0.30 
28 O7F 1.82 1.88 69 ROIZ 155 0.61 
29 O7G 1.82 1.88 70 R10Z 1.19 0.70 
30 O7H 1.82 1.88 71 R11A 13.24 0.92 
all O7| 1.82 1.88 72 R11B 13.24 0.92 
32 O75 11.82 1.88 73 R11C 13.88 1.01 
33 S$01Z 6.68 0.00 74 R11D 13.24 0.92 
34 S$02Z 1.99 0.00 TS R12A 7.38 5.41 
35 S03Z 34.10 0.02 76 R12B 7.38 5.41 
36 SO4A 1.17 0.05 77 R12C 7.38 5.41 
37 S04B 1.17 0.05 78 R12D 7.38 541 
38 S06Z 1.82 0.40 79 R12G 7.38 5.41 
39 SO7Z 7.02 0.52 80 R12H 7.38 5.41 
40 SO8A 3:30 1.94 81 R121 7.38 5.41 
41 S08B 3.33 1.94 82 R12) 7.38 5.41 


149 


150 


ICILS 2018 TECHNICAL REPORT 


Table 11.8: Percentages of omitted responses and items not reached due to lack of time for CT items 


Item no. Itemname | Omitted Not 
reached 

1 TAO1Z 12.94 0.00 
2 TAO2Z 10.24 0.00 
3 TAO3Z 2A AS 0.17 
4 TAO4Z 2.94 0.32 
5 TAO5Z 3.69 0.36 
6 TAO6Z 2.81 0.75 
7 TAO7Z 11.04 2.09 
8 TAO8Z 539 8.19 
9 TFO1E 1.84 0.00 
10 TFO2L 3.33 0.00 
11 FO3TEC 401 0.40 
12 FO4TEC 2.96 0.67 
13 FOSTEC 3.82 12d 
14 FO6TEC 17.40 2.21 
15 FO7TEC 8.33 6.07 
16 FO8TEC 30.69 10.14 
AF TFO9T 18.63 30.43 


Table 11.9: Percentages of omitted responses and items not reached due to lack of time overall, by module 


Average percentage 


Omitted Not reached 

Module 

Band competition 49 0.5 
Breathing 8.9 1.4 
School trip 5.5 1.0 
Board games 12.0 0.3 
Recycling 77 2A 
Grand average 7.6 12 


When comparing the proportion of omitted and not reached responses across CIL modules, we 
observed that the Board games module had the highest percentage of omitted responses (12%), 
while Band competition module had the lowest proportion (5%). Average percentages of not- 
reached responses were close to zero for most modules. Recycling module showed somewhat 
higher percentages of not-reached responses than the other modules. 


International item calibration and test reliability 


Item parameters for ClL were obtained from a joint d 


ata file 


that included response data from both 


ICILS 2013 and ICILS 2018. We included the ICILS 2013 data to improve the estimation of link 


items and for the purpose o 
applied for the equating of 1 


requirements and ICILS 20 
and had met sample partici 


but did not meet sample partici 


data file. Countries were eq 


the adjudication process). 


ua 


f equating, using the pre 
[IMSS, PIRLS, and ICCS 
and Schulz 2018). We included ICILS 2018 data from a 
13 data from three cou 
pation requirements. 
pation requirements in 201 
y weighted within each ICI 
items were included (except for items that were deleted 


ferred 


tries th 
Denmark participated in both 2013 and 2018 


n 


IEA joint calibration methodology also 


data (see Foy and Yin 2016, 2017; Gebhardt 


| 11 countries that met the sampling 
at participated in both 2013 and 2018 


18, and so was not included in the joint 
LS cycle for the CIL calibration and all 


nationally or internationally following 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 151 


Table 11.10: Final parameter estimates of the CIL items 


Item no. Item name Item Step 1 Item no. Item name Item Step 1 Step 2 
parameter parameter 
1 BO1Z -0.939 42 SO8C 0.017 3.145 
2 BO2Z -0.895 43 S08D 0.231 0.025 
3 BO3Z 0.168 44 SO8E 0.632 0.911 
4 BO4Z 0.701 45 SO8F 0.931 -0.653 
5 BO5Z -0.016 46 SO8G -0.285 -0.085 
6 BO6Z -2.678 47 GO1Z 1.424 
7 BO7A -3.766 48 GO2Z -0.372 
8 BO7B -3.326 49 GO3Z 1.742 
9 BO7C -0.905 50 GO04Z -1.045 
O BO8Z 0.130 -0.746 51 GO5B -0.487 
1 BO9A 0.197 -1.683 52 GO6Z 1.046 
2 BO9B -0.712 0.915 53 GO7Z -1.543 
3 BO9C -0.487 0.859 54 GO8Z -0.270 
4 BO9D -0.448 55 GO9C 0.255 -0.256 
5 BO9E 0.852 56 GO9D 0.180 
6 BO9F 0.179 57 GO9E 1.490 
VA BO9IG 0.321 58 GO9F 1.787 
8 O1Z -0.648 59 GO9IG -0.052 
9 O2Z 0.612 60 RO1Z -0.454 
20 03Z -1.509 61 RO2Z 1.033 
21 O5Z 1.456 62 RO3Z -0.382 
22 O6Z 0.529 63 ROAZ 0.166 
23 O7A -1.481 64 ROSA -0.201 0.729 -0.875 
24 O7B 0.197 0.108 65 RO6A -0.456 
25 O7C -0.374 -0.015 66 RO6B 0.631 
26 O7D -0.743 0.245 67 RO7Z 0.437 
27 O7E -0.168 0.079 68 RO8Z 0.330 
28 O7F 0.300 0.231 69 ROYZ -0.798 
29 O7G 0.945 0.421 70 R10Z 0.647 
30 O7H -0.128 0.344 71 R11A -1.305 2.717 
31 O7 0.291 72 R11B -1.283 
32 O7J 1.615 73 R11C -1.226 
33 S01Z “1.971 74 R11D -0.029 
34 SO2Z 2.665 75 R12A 2.088 
35 SO03Z 0.320 76 R12B 0.502 
36 SO4A 1.188 77 R12C -0.708 
37 SO4B -1.006 78 R12D 0.821 
38 S06Z 0.088 79 R12G 2.044 
39 SO7Z 2.613 80 R12H 0.402 0.150 
40 SO8A -0.509 81 R12 1.325 
At SO8B -0.027 82 R12) 1.870 


152 ICILS 2018 TECHNICAL REPORT 


Table 11.11: Final parameter estimates of the CT items 


Item no. Item name Item parameter Step 1 Step 2 
41 TAO1Z -0,.234 0.485 
2 TAQ2Z -1.109 0.317 
3 TAOQ3Z -0.646 1.157 
4 TAO4Z 0.733 
5 TAOQ5Z 0:233 -0.047 
6 TAO6Z 0.017 0.225 
7 TAO7Z 1.105 0.711 
8 TAO8Z 0.710 -0.377 
9 TFO1E -1.205 #1352 -0.030 
10 TFO2L -1.587 0.818 
11 FOSTE -0.278 -0.466 -1.546 
12 FO4TE -0.578 0.031 -1.497 
13 FO5TE -0.132 0.921 0.454 
14 FO6TE 0.635 “01757 -0.585 
15 FO7TE 0.544 -0.429 0.603 
16 FO8TEC 0.700 2.042 
A? FO9 1.090 1.663 
Item parameters for CT were obtained from a data file that included response data from all seven 
countries that met the sampling requirements for ICILS 2018. 
Missing student responses that were likely to be due to problems with test length (“not reached 


items”) were excluded from the calibration of item parameters, but included and treated as 
“incorrect” when scaling the student responses. Items for which technical failures occurred were 
treated as not administered. Omitted items were treated as incorrect at both stages. 


From this, we identified a set of item parameters that we used to scale the ICILS 2018 CIL data 
(Table 11.10). The final set of CT item parameters is displayed in Table 11.11. 


The overall test reliabilities for the two cognitive assessments following the removal of items with 
non-satisfactory psychometric properties, based on the pooled datasets and obtained from the 
scaling model, were 0.97 (CIL) and 0.84 (CT) (ACER ConQuest 4.0 estimate). 


CIL and CT estimates 


The accuracy of measuring the latent ability @ at the individual level can be improved by using a 
larger number of test items. However, in large-scale surveys such as ICILS, the purpose is to obtain 
accurate population estimates through use of instruments that also cover a wider range of possible 
aspects of cognitive abilities. 


The use of amatrix-sampling design, where individual students are allocated modules inasystematic 
way and respond to a set of items obtained from the main pool of items, has become standard in 
assessments of this type (see Chapter 2). However, reducing test length and administering subsets 
of items to individual students introduces a considerable degree of uncertainty at the individual 
level. Aggregated student abilities of this type can lead to bias in population estimates. This problem 
can be addressed by essentially treating a student's ability estimate as a missing data problem and 
employing plausible value methodology that uses all available information from student tests and 
questionnaires to impute an ability estimate, a process that leads to more accurate population as 
well as sub-group estimates (Mislevy 1991; Mislevy and Sheehan 1987; von Davier et al. 2009). 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 153 


Using item parameters anchored at their estimated values from the calibration sample makes it 
possible to randomly draw plausible values from the marginal posterior of the latent distribution for 
each individual. Estimations are based on the conditional item response model and the population 
model, which includes the regression on background variables and between-school differences 
used for conditioning. (For a detailed description see Adams et al. 1997; also Adams 2002.) In 
order to obtain estimates of students’ CIL and CT, we used the ACER ConQuest 4.0 software to 
draw plausible values. 


All available international student questionnaire variables were used for conditioning on 
background information on students. To take missing responses into account, all missing values 
in a variable were substituted with either the mode or the mean and corresponding indicators 
for the occurrence of missing values were added as additional variables. Appendix D lists all the 
international student-level variables (along with their respective scoring) that were used in the 
conditioning of plausible values for CIL and CT. 


Because of the large number of variables, principal component analyses (PCA) were used to reduce 
the number of student-level variables as conditioning variables so that these reflected 99 percent 
of the variance in the original variables. At the student level, only gender and its corresponding 
missing indicator were included as direct conditioning variables. To account for between-school 
differences, stratum indicators and the average of (weighted likelihood) ability estimates for all 
other students in the same school were also introduced as direct conditioning variables. 


For ICILS 2013, the CIL scale was originally established and transformed to a metric with a mean 
of 500 and a standard deviation of 100 for equally weighted ICILS 2013 countries that had 
met sampling requirements (categories 1 and 2), also excluding benchmarking participants (see 
Gebhardt and Schulz 2015). This linear transformation was computed by applying the formula: 


6), = 500 +100 ar Hecis3) 
n(CIL) + 


Ocici3) 


where 6,,c;,, were the student scores in the international metric, 8, were the original logit scores, 
Ugciis) Was the average of students’ CIL logit scores (-0.119) for a pooled dataset with equally 
weighted national ICILS 2013 samples from countries that had met IEA sample participation 
requirements, and Ggc;3) was the corresponding standard deviation (1.186). This transformation 
was applied to each of the five plausible values reflecting CIL derived for ICILS 2013. The equating 
of ClL scores derived in ICILS 2018 will be described in the next section. 


To transform the original CT score in ICILS 2018 to anew reporting metric for this cycle, a similar 
transformation was applied as for establishing the CT scale metric: 


0’. = 500 +100 tee Hecr13) 
n(CT) 


Snacris) 


Here, Oc) represents the student CT scores in the international metric, 6,;c) the original logit 
SCOFES, Mgcriz the average of students’ CT logit scores (-0.1490) for a pooled dataset with seven 
equally weighted national ICILS 2018 samples from countries that had participated in the 
international option and met IEA sample participation requirements, and Ongcri3) the corresponding 
standard deviation (0.9702). 


154 ICILS 2018 TECHNICAL REPORT 


Equating CIL scores from ICILS 2013 and 2018 


To achieve the transformation of |CILS 2018 ClLscores to the scale originally established with ICILS 
2013 data, it was necessary to equate the new scale scores. As mentioned earlier, all ICILS 2013 
item parameters were re-estimated concurrently during the ICILS 2018 joint calibration process. 


Before joining the data from the two assessment cycles, we reviewed the relative difficulties of the 
common items to evaluate the quality of the link. In total there were 46 items common between 
the two ICILS assessment cycles. To compare relative difficulties of the 46 common items between 
the two assessments, all ICILS 2018 items were also calibrated separately using the data from 
11 countries (excluding two benchmarking entities and the data from the United States as these 
participants did not meet the sampling requirements). These item difficulties were compared with 
the item difficulties estimated from the calibration sample in ICILS 2013. 


The difference in relative difficulty for each item between the values estimated in 2013 and 2018 
calibrations was used to assess the quality of the items as a link. Differences were expected to be 
zero. Adifference of more than half a logit was considered to be large and would result in breaking 
the link of that item. After a careful consideration of results of various comparisons, a final set of 
36 link items was selected. The remaining 10 items common between two cycles were kept in the 
joint calibration dataset but were treated as separate items. 


Figure 11.8: Relative item difficulties for CIL common items in 2013 and 2018 


25 
2.0 


15 Bee 


-4.0 “3.5 -3.0 -2.5 -2.0 “15 -1.0 058 ¥ 56 0.5 1.0 15 2.0 25 


Relative item difficulty - 2018 
: 
i 


-40 
Relative item difficulty - 2013 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 155 


To construct ICILS 2018 scale, the data from 11 countries that met the sampling requirement 
in 2018 were merged with the data from three trend countries collected in the ICILS 2013 
assessment. A concurrent calibration was performed regressing on the cycle to estimate the 
parameters of all iterns used over the two assessment cycles. In total the item difficulties were 
estimated for 111 items. These included all 82 items used in ICILS 2018, 19 items used only in 
ICILS 2013, and an additional 10 items that were un-linked and thus estimated for 2013 and 2018 
separately (no overlap of the data in the combined dataset). 


For equating purposes, the new item parameter estimates were used to redraw plausible values 
for the ICILS 2013 sample, using full conditioning. We only included the three countries that 
met the sampling requirements in both cycles. Subsequently, we computed the pooled mean and 
standard deviation of the plausible values on the 2013 scale and on the 2018 scale. Comparing 
those distributions resulting in the following linear transformation to equate the ICILS 2018 
student abilities onto the historical ICILS 2013 scale in logits: 


OF = 1.0258 x 6° + 0.2127 
These equated plausible values were subsequently placed onthe ICILS 2018 international reporting 


scale by applying the same transformation as in ICILS 2013: 


@, - (-0.1188) 
1.1859 


6; = 500 +100 


Because the transformation equating the ICILS 2018 data with the ICILS 2013 data depended on 
the change in the degree of difficulty of each of the individual link items, the sample of link items 
chosen influenced the choice of transformation. This meant that the resulting transformation 
would have been slightly different if we had chosen an alternative set of link items. Uncertainty in 
the transformation thus relates to the sampling of the link items, in the same way that uncertainty 
in values such as country averages is an outcome of the particular sample of students that is used. 


The uncertainty resulting from link-item sampling is referred to as linking error, and it is an error 
that analysts have to take into account when comparing the results arising out of different data 
collections (see Monseur and Berezner 2007). As is the situation with the error that is introduced 
through the process of sampling students, the exact magnitude of this linking error cannot be 
determined. We can, however, estimate the likely range of magnitudes for this error and take it 
into account when interpreting results. As with sampling errors, the likely range of magnitude for 
the errors is represented as a standard error. 


The following approach has been used to estimate the equating error. Suppose we have a total of 
| score points in the link items in K modules. Use i to index items in a unit and j to index units so 
that 5) is the estimated difficulty of item iin unit j for year y, and let: 


_— § 2018 ¢ 2013 
C= 85 — 85 


The size (number of score points) of unit j is m, So that: 


ak 
zmat and mazda M, 


Further let: 
= le st cat 5k ym 
a m. 2 20 and oN Dead “i 


J 


and then the link error, taking into account the clustering of items was computed as follows: 


K 2 = K 2 = 
dM, (Cie Cc)? - DM (c.- c)? K 


\¥ K(K-1)m Le K-1 


LinkErrorzoi¢, 2019= 


156 


ICILS 2018 TECHNICAL REPORT 


The development of proficiency levels for CIL 


One of the objectives of |CILS was to establish a described ClL scale that would become a reference 
point for future international assessments in this learning area. Establishing proficiency levels 
of CIL is an informative way of describing student performance across countries and also sets 
enchmarks for future surveys. 


emonstrate certain understandings and skills that are associated with that level. These students 


b 
Students whose results are located within a particular level of proficiency are typically able to 
d 
also typically possess the understandings and skills defined as applying at lower proficiency levels. 


When developing proficiency levels, a method was applied that ensured that the notion of “being 
t alevel” could be interpreted consistently and in line with the fact that the achievement scale is 
a continuum. It was therefore attempted to provide a common understanding about what being 
at alevel meant and to ensure that this meaning was consistent across different proficiency levels. 
This method took the following three questions into account: 


w 


e What is the expected success of astudent at a particular level on a test containing items at that 
level? 


e What is the width of the levels in that scale? 


e What is the probability that a student in the middle of a level will correctly answer an item of 
average difficulty for that level? 


We adopted the following two parameters for defining proficiency levels to create the properties 
described below: 


e The response probability (rp) for reporting item parameters: this was set at rp = 0.62 (providing a 
more appropriate level of “mastery” than rp = 0.5). 


e The width of the proficiency levels: this was set at 0.8 logits (i.e., the original latent trait scores 
from the IRT scaling model prior to their transformation to the reporting metric). 


Using these parameters, we were able to infer the following about students’ ClL in relation to the 
proficiency levels: 


e Students whose results placed them at the lowest possible point of a proficiency level were 
ikely to correctly answer (on average) slightly over 50 percent of the items on a (hypothetical) 
test made up of items with locations spread uniformly across the level. 


e Students whose results placed them at the lowest possible point of a proficiency level had a 62 
percent probability of giving the correct response to an item at the bottom end of the proficiency 
evel. 


e Students whose results placed them at the top of the proficiency level had a 78 percent 
probability of correctly responding to an item at the bottom end of the proficiency level. 


The approach that was chosen was essentially an attempt to apply an appropriate choice of mastery 
by placing item locations at rp = 0.62 while simultaneously ensuring that the approach would be 
understood by the readers of ICILS reports. 


The international research team identified and described four proficiency levels that could be 
used when reporting student performances in CIL from the assessment. Figure 11.9 shows the 
cut-points for these levels (in logits and final scale scores). The figure also cites the percentage of 
students at each proficiency level across the participating ICILS countries. 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 


Figure 11.9: CIL proficiency level cut-points and percentage of students at each level 


rp = 0.62 evena CliL scale 
P 2% 
7 Level 3 661 
19% 
0.3 576 


Level 2 
36% 

Level 1 eet 
25% 

Below 1 
18% 


When reporting released CIL items and mapping them against proficiency levels, we had to 
transform location parameters of these items to a value that reflected a response probability of 
62 percent (rp = 0.62). This is achieved by adding the natural log of the odds of 62 percent chance 
to the original log odds and transforming the result to the international metric by applying the 
same transformation as for the (original) student scores. The standardized item difficulty 6; for 
each CIL item was obtained as follows: 
(1.0258 x 8 + 0.2127) - In hee 
0.38 


8 = 500+ 100 H @(ciL13) 


U @CIL13) 


Here, 6, is the item difficulty in its original metric, U @ci.13) is the ICILS 2013 average of students’ 
CIL logit scores (-0.119) and H ci. 13) is its corresponding standard deviation (1.186) that were 
used to standardize the plausible values. As the CIL item difficulty parameters (6) were calibrated 
based onthe combined set of old and newitems, the same transformation as for student CIL scores 
had to be applied before transforming them to the CIL reporting metric at rp = 0.62. 


157 


158 


ICILS 2018 TECHNICAL REPORT 


References 


Adams, R. (2002). Scaling PISA cognitive data. In R. Adams, & M. Wu (Eds.), Technical report for the OECD 
Programme for International Student Assessment (pp. 99-108). Paris, France: OECD Publications. 


Adams, R. J., & Wu, M. L. (2009). The construction and implementation of user-defined fit tests for use 
with marginal maximum likelihood estimation and generalized item response models. Journal of Applied 
Measurement, 10(4), 1-16. 
Adams, R., Wu, M., & Macaskill, G. (1997). Scaling methodology and procedures for the mathematical and 
science scales. In M. O. Martin, & D. L. Kelly (Eds.), TIMSS technical report: Vol. II. Implementation and analysis: 
Primary and middle school years. Chestnut Hill, MA: Boston College. 


Adams, R. J., Wu, M. L., & Wilson, M. R. (2015). ACER ConQuest: Generalised Item Response Modelling 
Software [computer software]. Version 4. Camberwell, Victoria: Australian Council for Educational Research 
ACER). 
Foy, P, & Yin, L. (2016). Scaling the TIMSS 2015 achievement data. In M. O. Martin, |. V. S. Mullis, & M. 
Hooper (Eds.), Methods and procedures in TIMSS 2015 (pp. 13.1-13.62). http://timss.bc.edu/publications/ 
timss/2015-methods/chapter-13.html 
Foy, P,, & Yin, Y. (2017). Scaling the PIRLS 2016 achievement data. In M. O. Martin, |. V.S. Mullis, & M. Hooper 
Eds.), Methods and procedures in PIRLS 2016 (pp. 12.1-12.38). https://timssandpirls.bc.edu/publications/ 
pirls/2016-methods/chapter-12.html 
Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Duckworth, D. (2019). IEA International Computer and 
Information Literacy Study 2018 assessment framework. Cham, Switzerland: Springer. https://www.springer. 
com/gp/book/9783030193881 
Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Duckworth, D. (2020). Preparing for life in a digital world. IEA 
International Computer and Information Literacy Study 2018 international report. Cham, Switzerland: Springer. 
https://www.springer.com/gp/book/9783030387808 

Gebhardt, E., & Schulz, W. (2015). Scaling procedures for ICILS test items. In J. Fraillon, W. Schulz, T. Friedman, 
J. Ainley, & E. Gebhardt (Eds.), International Computer and Information Literacy Study 2013 technical report 
pp. 155-176). Amsterdam, The Netherlands: International Association for the Evaluation of Educational 
Achievement (IEA). 
Gebhardt, E., & Schulz, W. (2018). Scaling procedures for ICCS 2016 test items. In W. Schulz, R. Carstens, 
B. Losito, & J. Fraillon (Eds.), CCS 2016 technical report (pp. 117-137). Amsterdam, the Netherlands: 
nternational Association for the Evaluation of Educational Achievement (IEA). 
asters, G.N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden, & R. K. Hambleton 
Eds.), Handbook of modern item response theory (pp. 101-122). New York/Berlin/Heidelberg: Springer. 


islevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. 
Psychometrika, 56, 177-196. 
islevy, R. J., & Sheehan, K. M. (1987). Marginal estimation procedures. In A. E. Beaton (Ed.), The NAEP 
1983-1984 technical report (Report No. 15-TR-20, pp. 293-360). Princeton, NJ: Educational Testing Service. 
onseur, C., & Berezner, A. (2007). The computation of equating errors in international surveys in education. 
Journal of Applied Measurement, 8(3), 323-335. 

Quittre, V., & Monseur, C. (2010). Exploring local item dependency for items clustered around common reading 
passage in PIRLS data. Paper presented at the fourth IEA International Research Conference, Gothenburg, 
Sweden, 1-3 July. Retrieved from https://www.iea.nl/sites/default/files/2019-04/ 

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: 
Nielsen & Lydiche. 
Rost, J., & von Davier, M. (1994). A conditional item-fit index for Rasch models. Applied Psychological 
Measurement, 18(2), 171-182. 


von Davier, M., Gonzalez, E., & Mislevy, R. (2009). What are plausible values and why are they useful? JERI 
Monograph Series: Issues and Methodologies in Large-Scale Assessments, vol. 2, 9-36. 


Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, IL: Mesa Press. 


CHAPTER 12: 


Scaling procedures for ICILS 2018 
questionnaire items 


Wolfram Schulz and Tim Friedman 


Introduction 


This chapter describes the procedures for the scaling of the ICILS 2018 questionnaire data 
(collected from students, teachers, ICT coordinators, and school principals) and the indices that 
were derived from these data. 


When describing the ICILS 2018 questionnaire indices, we distinguish the following two general 
types of indices: 
e Simple indices were constructed through arithmetical transformation or recoding, for example 
an index of immigration background based on information about the country of birth of students 
and their parents; 


e Scale indices that were derived through the scaling of items, this was typically achieved by using 
item response modeling of items with two or more categories. 


The chapter is divided into three main sections. The first section lists the simple indices that were 
derived from ICILS 2018 data and describes how they were created. The second section outlines 
the procedures used for the scaling of questionnaire data in ICILS 2018 followed by the third 
section which lists the scaled indices with statistical information on the factor structure of related 
item sets, scale reliabilities, and parameters used for the item response theory (IRT) scaling. 


Results from an analysis of cross-country validity of item dimensionality and constructs were 
already part of ICILS 2018 field trial analyses. The international study center at ACER conducted 
reviews of the extent to which measurement models were equivalent across participating countries 
for draft item material. To conduct this review we made use of exploratory as well as confirmatory 
factor analysis and item response modeling to examine cross-national measurement equivalence 
before the final selection of main survey questionnaire items (see examples of this type of analysis 
in Schulz 2009, 2017). 


Simple indices 


Student questionnaire 


Student age (S_AGE) was calculated as the difference between the year and month of the testing 
andthe year and month of astudent’s birth. Information from the student questionnaire (Question 
1) was used to derive age, except for students where this information was missing. In these cases 
information from student tracking forms (see Chapter 10 for more details) provided data for the 
calculation of this index. 


The formula for computing S_AGE was: 


(Ty ~ Sia) 
SAGE = (T,-5,)+— 


where T,, and S, are, respectively, the year of the test and the year of birth of the tested student, in 
four-digit format (e.g., “2018” or “2005”), and where T,, and S,, are respectively the month of the 
test and the month of the student’s birth. The result was rounded to two decimal places. 


In Question 2, students were asked about their gender. These were recorded as the sex of student 
(S_SEX), a girl (1) or a boy (0). For students with omitted data for this question, the gender from 
the tracking form was included. 


ICILS 2018 TECHNICAL REPORT 


Question 3 asked students about their expected highest level of educational attainment. The 


responses were classified using the International Standard Classification of 
framework (UNESCO 2012) The corresponding index students’ expected education (S_ISC 


the following c 


O) Nocompl 


NO BR 


ee) 


ISC 
ISCE 


( 
( 
( 
( 
( 


iN 


Th 


terms [Paren 
their nationa 


of parents and 
parent (S_P1 


the coun 
three ind 


O) ati 


tryo 


ves 


Firs 


on 


We assigned m 
or that of all p 
background wi 


based on a dichotomous i 


families (coded 


Question 5 of the |CILS student questionnai 
of the time was the language of assessment 


derive an index 
(1) 
(0) 


The langu 
The langu 


Occupational d 
parents are in 


details about their parents’ jobs, 
11/12). The pai 


and S_P2WOR 
parent is not in 


digit codes usin 
(International Labour Organization 2007). These codes are con 


Completion 
Completion 
D leve 


D leve 
level 8 (doctorate level tertiary). 


e ICILS stude 


context. In 
each corresponding option (inc 


BORN), and their 
born in the country of the assessm 
f assessment. The 
icator variables 
tudents (st 


t-generation stud 
parent(s) were born 
-native students 
parent(s) were born 


ategories: 

etion of ISC 
of ISC 
of ISC 


4 (non 


ED level 2; 


ED level 2 (lower-seconda 


-tertiary post-secondary) 


6 (bachelor’s level tertiary) or |S 


or guardian 1] and 
structi 
udi 
d ti 
second parent (S_ 
ent while a sco 


ng how to resp 


if they only spen 


a 


udents w 


ents (students who wer 
in another country); and 
(studen 
in another country). 


issing values to students with 
arents, or for all three ques 
th computer and information 
ndicator variable 
1) and students from non-immig 


ra 


re aSKE 


TLANG), inw 


ne most of the ti 


on home language (S 


age spoken at hon 


age spoken at home most of the ti 


ata for students’ parents? were ob 
paid work or not (Questions 6 and 
using a pair o 


d work status of 
K (parent 2) (1 indicati 


g the International Sta 


nt questionnaire collected information on the country of birth of t 
and their parents (Question 4). National research coo 
Parent or guardian 2] to an appropriate term fo 
ons were provided to 


me with one paren 


index of immigrant background (S_IMMIG) was created 
nd had three categories: 


ho had at least one parent born in the country of assess 


ts who were born outsi 


missing responses for either their own p 
tions. The analyses of the relations 
literacy (Cl 
that distinguish 


or an 


m 


Education (ISC 
ED) 


D) 
had 


ry); 


ED level 3 (upper-secondary); 


or ISC 
Cc 


ED level 5 (vocational tertiary); and 


ED level 7 (master’s level tertiary) or ISCE 


he students 
to adapt the 
r parents in 
students on who to select as a parent for 
if they spend time with more than one set 
t). For each student (S_ SBORN), their first 
ORN), a code of 1 was given if they were 
f 2 was assigned if they were born outside 
using these 


rdinators (NRCs) were asked 


ond 


P2B 
reo 


ment); 


e borninthe country of assessment and whose 


de the country of assessment and whose 


ace of birth, 
hip of immigrant 
inking (CT) were 
ed between students from immigrant 
MMBGR. 


L) and computational th 


nt families (coded 0) called S_ 


dstuden 
other la 


me most 
ation to 


ts if the language spoken at ho 
nguage.t We used this inform 


hich responses were grouped into two categories: 


e was the language of assessment; and 


me differed from the language of assessment. 


tained by firstly asking students whether their 
10), and secondly asking students to provide 
f open-ended questions (Questions 7/8 and 


the students’ parents was classified into S P1WORK (parent 1) 
ng that the parent is in paid 
paid work). The open-ended responses were coded by national centers into four- 
ndard Classi 


work and O indicating that the 


ification of Occupations (ISCO-O08) framework 


tained in the indices S P1lISCO 


1 Most countries collected more detailed information on language use. This information is not included in the international 


database. 


2 Students could complete parental occupation questions for up to two parents. 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 161 


and S_P2ISCO for the students’ parents. We then mapped these codes to the internationa 
socioeconomic index of occupational status (ISEI) (Ganzeboom et al. 1992). The three indices that 
we obtained from these scores were occupational status of the first parent (S_P1ISEI), occupationa 
status of the second parent (S_P2ISEI), and the highest occupational status of both parents 
(S_HISEI), with the latter corresponding to the higher ISEI score of either parent or to the 
only available parent's ISEI score. For all three indices, higher scores indicate higher levels of 


occupational status. 


Questions 9 and 13 asked about the highest parental education attainment for the students’ 
parents and provided the data for measuring another important family background variable. The 
core difficulties with this variable relate to international comparability (education systems differ 
widely across countries and over time within countries) and response validity (students are often 
unable to accurately report their parents’ levels of education). Levels of parental education were 
classified according to the ISCED levels. 


Recoding educational qualifications into the following categories provided indices of highest 
parental educational attainment: 


(0) Did not complete ISCED level 2; 

(1) ISCED level 2 (lower-secondary); 

(2) ISCED level 3 (upper-secondary); 

(3) ISCED level 4 (non-tertiary post-secondary) or ISCED level 5 (vocational tertiary); and 

(4) ISCED level 6 (bachelor’s level tertiary) or ISCED level 7 (master’s level tertiary) or ISCED 
evel 8 (doctorate level tertiary). 


Indices with these categories are available for each student’s first parent (S_P1ISCED) and second 
parent (S_P2ISCED). The index for highest educational level of parental education (S_HISCED) 
corresponds to the higher ISCED level of either parent. 


Question 14 of the ICILS student questionnaire asked students how many books they had in their 
homes. Responses to this question formed the basis for an index of students’ home literacy resources 
(S_HOMLIT) with the following categories: 


(0) Oto 10 books; 
(1) 11 to 25 books; 
(2) 26to 100 books; 
(3) 
(4) 


101 to 200 books; and 
More than 200 books. 


Question 15B of the ICILS student questionnaire was an international option question where 
students were asked if they have an internet connection at home. The index (S_INTNET) was recorded 
as Yes (1) or No (0). 


In Question 16, students were asked their number of years of experience in using different types 
of ICT devices. The three types of devices included: 


e Desktop or laptop computers; 
e Tablet devices or e-readers (e.g., iPad, Tablet PC, Kindle); and 


e Smartphones except for using text and calling. 


Responses to these items were coded into three respective indices: computer experience in years 
(S_EXCOMP), tablet experience in years (S_EXTAB), and smartphone experience in years (S_ EXSMART) 
with the following response options available: 


(O) Never or less than one year; 


162 


GON KF 


( 
( 
( 
( 


i 


The last question of thes 


At least one year bu 
At least three years 


At least five years b 


Seven years orf more. 


ICILS 2018 TECHNICAL REPORT 


t less than three years; 


but less than five years; 


ut less than seven years; and 


tudent questionnaire, Question 30, asked students whether they studied 


acomputing subject (computing, computer science, information technology, informatics or similar) 


during the current sch 


schoo 


Teacher questionnaire 


| year (S_ICTSTUD) 


The sex of teacher (T_SEX 
questionnaire. Female teaches were coded as 1, whereas male teachers were coded as O. 


Teach 


quest 


erage (T_AGE) cons 


Quest 


popul 


the allocation of teacher's 


(1.00 
(0.50 
(0.33 
(0.25 


Ques 


Only inthe sampl 
Inthe sampled sc 
Inthe sampled sc 


Inthe sampled sc 


computers for teaching 


ICT use during lessons (T_ 


ool year. Responses were used to derive an index of ICT studies in current 


which was recorded as Yes (1) or No (0). 


) was computed from the data captured from Question 1 of the teacher 


isted of the midpoint of the age ranges given in Question 2 of the teacher 


ionnaire. We assigned “less than 25” a value of 23 and coded “60 or over” as 63. 


ion 4.asked teachers to indicate the number of schools in which they teach the target grade 
ation for the current school year. Data captured from this question were used to calculate 


staff time to sampled school (T_WGT) and was coded as: 
ed school; 

hool and another school; 

hool and in two other schools; and 


hool and in three or more other schools. 


tion 5 of the teacher questionnaire asked teachers to indicate how long they have been using 


purposes. They were asked to distinguish between ICT experience with 


EXLES) and ICT experience with ICT use for preparing lessons (T_EXPREP). 


These indices were coded as: 


O) Never; 


School questionnaires 


y 


( 

(1) Less than two years; 

(2) Between two and five years; and 
( 


3) More than five years. 


he first question of the principal questionnaire asked whether respondents were male or female. 
his was used to form the index sex of principal (P_SEX) where female principals were coded as 1, 
and male principals coded as O. 


e ICILS school principal questionnaire asked in Question 3 about the number of girls and boys 
inthe entire school (IP2GO3A, IP2GO3B) and in Question 4 also about the enrolled girls and boys 
at the target grade (IP2GO4A, |P2GO4B). The numbers given for each gender group were summed 


to form an index of the number of students in the entire school (P_LNUMSTD) and of the number of 
students in the target grade (P_NUMTAR).° 


Question 5 also asked principals to report the lowest (youngest) (IP2GO5A) grade and the highest 
(oldest) (IP2GO5B) grade taught in their school. The difference between these two grades was 


calculated as the number of grades in school (P_NGRADE). 


3 These indices will be included in the ICILS 2018 Restricted Use Data flle, while the ICILS 2018 Public Use Data flle will 
include these variables in a categorized form as PNUMSTD_CAT (1 = 1-300, 2 = 301-600, 3 = 601-900, 4 = more than 
900) and PNNUMTAR_CAT (1 = 1-100, 2 = 101-200, 3 = more than 200). 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 163 


Question 6 collected the information on the number of teachers (P_NUMTCH) in a school. The 
index was calculated by summing the total number of full time teachers (IP2GO6A) with the total 
number of part-time teachers weighted at 50 percent (0.5 x IP2GO6B).* The ratio of school size 
and teachers (P_RATTCH) was calculated by dividing the number of teachers (P_NUMTCH) by the 
number of students (P_NUMSTD) in aschool. 


Question 8A collected information about whether the school was a public school or private school. 
This was used to form a private school indicator (P_PRIV) where public schools are coded as O and 
private schools coded as 1.° 


Question 8B asked principals to estimate the percentage of students at their school coming 
from economically affluent homes and those from economically disadvantaged homes (“O-10%,” 
“11-25%? “26-50%, and “more than 50%” for each home type). We used the responses to 
compute an indicator of school composition by student background (P_COMP) where a value of 1 was 
assigned to “schools with more affluent than disadvantaged students,’ 2 to “schools with neither 
more affluent nor more disadvantaged students,” and 3 to “schools with more disadvantaged than 
affluent students.’ 


Question 3 of the ICT coordinator questionnaire asked respondents to indicate the ICT experience 
in years in the school (C_EXP). These were recoded as: 


(O) Never, we do not use ICT; 


be 


Fewer than 5 years; 


N 


( 
(2) Atleast 5 but fewer than 10 years; and 
( 


ee) 


10 years or more. 


Question 7 of the ICT coordinator questionnaire collected data on the number of desktop 
computers, the number of laptop/notebooks, and the number of tablet devices in the school. The 
sum of ICT devices (C_ICTDEV) was derived by adding these numbers. Respondents were also 
asked to indicate the number of these different types of devices that were available for student use. 
These were summed to derive an index of sum of ICT devices available for student use (C_ICTSTD). 
The question also asked respondents to indicate the number of school-provided smart boards or 
interactive white boards available in the school. In conjunction with the number of students at 
school (P_NUMSTD) these data also provided the following ratios: 


e Ratio of school size and number of ICT devices (C_RATDEV) = Number of students in the school 
P_NUMSTD) / number of ICT devices in the school altogether (C_ICTDEV). 

e Ratio of school size and number of devices available for students (C_RATSTD) = Number of students 
in the school (P_NUMSTD) / number of ICT devices in the school that are available to students 
C_ICTSTD). 
e Ratio of school size and smart boards (C_RATSMB) = Number of students in the school (P_ 
UMSTD) / number of smart boards or interactive white boards available (II2GO7C). 


4 This index will be included in the ICILS 2018 Restricted Use Data, while the ICILS 2018 Public Use Data will include this 
variable in a categorized form as PNNUMTCH_CAT (1 = 1-25, 2 = 26-50, 3 = 51-75, 4 = more than 75). 
5 This index will only be included in the ICILS 2018 Restricted Use Data. 


164 


Scaling procedures 


Review of item statistics 


ICILS 2018 TECHNICAL REPORT 


As part of the scaling analyses of questionnaire data, we reviewed reliabilities both overall and for 
national samples using Cronbach's alpha coefficient as an estimate of the internal consistency of 
each scale (Cronbach 1951). When reviewing reliabilities we regarded values above 0.7 as indicating 
satisfactory internal consistency and those above 0.8 as showing high degrees of reliability (see, for 


example, Nunnally and Be 


rnstein 1994, pp. 264-265). Apart from scale reliabilities, this analysis 


stage also considered the percentages of missing responses (which tended to be very low in most 


cases) as well as the correl 
items in a scale (adjusted i 


Confirmatory factor ana 


Structural equation mode 


data. At the field trial sta 
structures.© When using 


the SEM framework, laten 


X=A,E+86 


ations between individual items and the scale scores based on all other 
tem-total correlations). 


lysis 


ing (SEM) (Kaplan 2009) provides a tool for modeling and confirming 


theoretically expected dimensions measured with sets of student, teacher, or school questionnaire 


ge, it can also be used to re-specify originally expected dimensional 
confirmatory factor analysis, researchers acknowledge the need to 


employ atheoretical model of item dimensionality that can be tested via the collected data. Within 


t variables link to observable variables via measurement equations. An 


observed variable x is thus modeled as: 


where A, is aqxk matrix of factor loadings, § denotes the latent variable(s), and dis aq x 1 vector 


of unique error variables. 
factor structure. 


When conducting the con 


model-fit indices provided measures of the extent to which a particular model with an assumed 
a-priori structure had a good fit with regard to the observed data. For the ICILS 2018 analysis, the 
assessment of model fit was primarily conducted through reviews of the root-mean square error of 
approximation (RMSEA), the comparative fit index (CF), and the non-normed fit index (NNFI), all of 


which are less affected th 
Long 1993). 


The expected covariance matrix is fitted according to the theoretica 


firmatory factor analyses for ICILS 2018 questionnaire data, selected 


an other indices by sample size and model complexity (see Bollen and 


We interpreted RMSEA values indicating model fit as unacceptable with values over 0.410, as 
marginally satisfactory with values between 0.08 and 0.10, as satisfactory between 0.05 and 0.08, 
and as aclose fit with values lower than 0.05 (see MacCallum et al. 1996). As additional fit indices, 
CFI and NNFI are bound between O and 1. Values below 0.90 indicate a non-satisfactory model 


fit whereas values greater 
and Bonnet 1980; Hu and 


the extent to which each 


proportion of unexplained 
8 = (1-2) 


than 0.95 were interpreted as suggesting aclose model fit (see Bentler 
Bentler 1999). 


n addition to these fit indices, reviews of standardized factor loadings and the corresponding 
residual item variances provide further evidence of model fit for questionnaire data. Standardized 
factor loadings 4’ can be interpreted in the same way as standardized regression coefficients by 
assuming that indicator variables are regressed on an underlying latent factor. The loadings reflect 


indicator measures the underlying construct. Squared standardized 


factor loadings indicate how much variance in an indicator variable can be explained by the latent 
factor and are related to the (standardized) residual variance estimate 6’ (reflecting the estimated 


variance) as: 


6 Inthe initial stages of field tri 


al analyses, we also employed exploratory factor analysis (Tucker and MacCallum 1997; 


Fabrigar et al. 1999) to determine item dimensionality of larger item pools. 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 165 


The use of multidimensional models also allows an assessment of the estimated correlation(s) 
between latent factors, which provide(s) information on the similarity of the different dimensions 
easured by related item sets. 


m 
Generally, maximum likelihood estimation and covariance matrices are not appropriate for analyses 
of (categorical) questionnaire items because the approach treats items as if they were continuous. 
Therefore, the analyses of ICILS 2018 relied on robust weighted least squares estimation (see 
Muthén et al. 1997; Flora and Curran 2004) to estimate the confirmatory factor models. The 
software package used for the estimations was MPLUS 7 (Muthén and Muthén 2012). 


Confirmatory factor analyses were carried out for sets of conceptually related questionnaire items 
that measured between one or more different dimensions. This approach allowed an assessment of 
the measurement model as well as of the associations between related latent factors. The scaling 
analyses were restricted to data from those countries that met sample participation requirements 
(see Chapter 8 for further information). National samples of students, teachers, and schools 
received weights that ensured equal representations of countries in the analyses. 


Ininternational studies, model parameters may vary across country and it may not be appropriate 
to assume the same factor structure for each population. To test parameter invariance, multiple- 
group modeling, as an extension of CFA, offers an approach to test the equivalence of measurement 
models across sub-samples (Little 1997; Byrne 2008). 


When considering a model where respondents belong to different groups indexed as g = 1, 2,..., 
G, the multiple-group factor model becomes 


x =A € +8 


8 x8 ~3 3 


Atest of factorial invariance (H,) where factor loadings are defined as being equal (often referred 
to as “metric equivalence’) (Horn and McArdle 1992) can be defined as 


Hy: Ay =A, =A5=..= A, 


Typically, model-fit indices are compared across different multiple-group models, each with an 
increasing degree of constraints; from relaxed models with no constraints through to constrained 
models with largely invariant model parameters. 


In this report, for all student and teacher questionnaire scales,’ three different multiple-group 

models for CFA were estimated with different levels of constraints on parameters: 

A. Unconstrained models where all parameters are estimated as country-specific (configural 
invariance); 


WO 


Models with constrained factor loadings across countries (metric invariance); and 


C. Models with constraints on factor loadings and intercepts (scalar invariance). 


odels with confirmed scalar invariance are the only ones that ensure full comparability of 
measurement models across participating countries. When comparing model fit across the three 
conditions, it needs to be acknowledged that with data from large samples, as is typically the case 
n international large-scale assessments, even very small differences appear to be significant. 
This makes hypothesis testing using tests of statistical significance difficult and therefore, when 
reviewing results, it is more appropriate to focus on relative changes in model fit across the three 
models with different levels of constraints. 


7 For school questionnaire data, we did not conduct this type of analyses given the relatively small size of national 
samples. 


166 


ICILS 2018 TECHNICAL REPORT 


Item response modeling 
In line with the scaling of test item data (see Chapter 11), item response modeling was used to 
scale questionnaire items. The one-parameter (Rasch) model (Rasch 1960) for dichotomous items 
models the probability of selecting an item category 1 instead of O as: 

ire exp(6,,- 6;) 

me" 41+ exp(6,- 6) 


where P; (6,) is the probability of person n scoring 1 on item i, 6, is the estimated latent trait of 
person n, and 6; is the estimated location of itemion this dimension. For each item, item responses 
are modeled as a function of the latent trait 6,. 


In the case of items with more than two categories (as, for example, with Likert-type items), this 
model can be generalized to the (Rasch) partial credit model (Masters and Wright 1997), which 
takes the form of: 

EXP Y j-9 (8,- 8; +7;) 
Y oeXP D Fig (On- 8; + 7) 


Py; (6,) = x = O,1,...,; 


where Px,(8,,) denotes the probability of person n scoring x on item i, 8, denotes the person’s latent 
trait, the item parameter 6; describes the location of the item on the latent continuum, and t; 
provides an additional step parameter. 


Weighted mean-square statistics (infit), statistics based on model residuals, were used in 
conjunction with a wide range of further item statistics to assess the fit of the IRT model. ICILS 
2018 used the ACER ConQuest software package (Adams et al. 2015) for the analysis of item 
scaling properties and the estimation of item parameters. 


The international item parameters were derived using equally weighted national datasets®: 


A _ Calibration of item parameters for the student questionnaire: This was done based on a pooled 
database with equally weighted national samples from 11 countries that met sample 
participation requirements for the student survey. 

B Calibration of item parameters for the teacher questionnaire: This was done based on a pooled 
database with equally weighted national samples from seven countries that met sample 
participation requirements for the teacher survey. 


C Calibration of item parameters for schoo! questionnaire: This was done based ona pooled database 
with equally weighted national samples from 11 countries that met sample participation 
requirements for the student survey. 


Following the estimation of international item parameter from the calibration sample, we computed 
weighted likelihood estimation to obtain individual student, teacher, or school scores. Weighted 
likelihood estimations are computed by minimizing the equation: 


k  exp(>”_ 6,- 6 +7;) 
2 tt ah i > m — ; os 0 
iEQ 2) Mt SOE (0,- 6;+ t;) 


for eachcasen, where r, isthe sum score obtained from aset of kitems withj steps between adjacent 
categories. This can be achieved by applying the Newton-Raphson method. The term J,,/2I,, (with 
[, being the information function for student n and J, being its derivative with respect to 6) is used 
as aweight function to account for the bias inherent in maximum likelihood estimation (see Warm 
1989). We used the ACER ConQuest software for the pre-calibration of item parameters in order 
to subsequently derive scale scores. 


8 Data from benchmarking participants were not included in the item parameter calibrations. 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 167 


For all questionnaire scales in ICILS 2018 the transformation of weighted likelihood estimates to an 
international metric resulted in reporting scales with an ICILS 2018 average of 50 and a standard 
deviation of 10 for equally weighted datasets from the countries that met sample participation 
requirements. This is achieved by applying the following formula: 


6,,- “accuse 
O4(ICILS18) 


6; 50-+10| 


where 6, are the scores in the international metric, 6, are the original weighted likelihood estimates 
in logits, and Upccisisy is the international mean of logit scores with equally weighted country 
subsamples. 0, ici,s1g) is the corresponding international standard deviation of the original weighted 
likelihood estimates. 


Table 12.1 presents the means and standard deviations used to transform the original scale scores 
for the student, teacher, ICT coordinator, and school principal questionnaires into the international 
metric. 


Table 12.1: Transformation parameters for new ICILS 2018 questionnaire scales (means and standard 
deviations of original IRT logit scores) 
International student questionnaire Teacher questionnaire 
Scale Mean SD Scale Mean SD 
S_GENACT -0.63 149 T_CLASACT -0.47 1.88 
S_SPECACT -0.89 0.97 T_CODEMP 0.36 1.83 
S_USECOM 0.43 0.88 COLIC 1.00 2.59 
S_USEINF -1.23 0.97 _ICTEFF 2.32 1.47 
S_ACCONT 0.26 1.05 _ICTEMP 1.00 1.94 
S_USESTD -0.51 1.03 _ICTPRAC -0.56 1.80 
S_SPECLASS -2.3 1.69 T_PROFREC -0.41 19 
S_GENCLASS -0.97 1.95 _PROFSTR -1.16 155 
S_ICTLRN 0.66 1.69 T_RESRC 0.19 1.90 
S_GENEFF 1.86 1.36 T_USETOOL -1.85 1.43 
S_SPECEFF -0.1 1.61 T_USEUTIL -0.23 1.47 
S_ICTPOS 1.61 1.79 TVWNEG 0.08 137 
S_ICTNEG 0.66 1.26 T_VWPOS 1.78 2.13 
S_ICTFUT 0.25 2.00 School principal questionnaire 
S CODLRN 0.25 1.85 Scale Mean SD 
ICT coordinator questionnaire P_EXPLR 0.90 1.90 
Scale Mean SD P_EXPTCH 1.32 2.33 
C_HINPED 0.23 141 P_ICTCO 0.97 1.47 
C_HINRES -0.35 1.48 P_ICTUSE 0.05 0.94 
P_PRIORH 1.54 1.78 
P_PRIORS 52 1.64 
P VWICT 3.63 2.24 


168 


ICILS 2018 TECHNICAL REPORT 


Describing questionnaire scale indices 


Questionnaire scales derived from weighted likelihood estimates (logits) present values on a 
continuum with an ICILS 2018 average of 50 and a standard deviation of 10 (for equally weighted 
national samples). This allows an interpretation of these scores by comparing individual scores or 
group average scores with the ICILS 2018 average but the individual scores do not reveal anything 
about the actual item responses and the extent to which respondents endorsed the items used 
to measure the latent variable. The scaling model used to derive individual scores allows the 
development of descriptions of these scales through a mapping of scale scores to (expected) item 
responses.’ 


Itis possible to describe item characteristics by using the parameters of the partial credit model to 
provide an estimate for each category of its probability of being chosen relative to the probabilities 
of all higher categories. This process is equivalent to computing the odds of scoring higher than 
a particular category. 


Figure 12.1 presents the results of plotting these cumulative probabilities against scale scores for 
a fictitious item, where respondents rate their agreement or a disagreement to a statement on 
a four-point scale. The three vertical lines denote those points on the latent continuum where it 
becomes more likely to score > O, > 1, or > 2. These locations T, are Thurstonian thresholds which 
can be obtained through an iterative procedure that calculates summed probabilities for each 
category at each (decimal) point on the latent variable. 


Figure 12.1: Summed category probabilities for fictitious item 


1.0 


Probabilty 


0.0 
-4.00 -3.00 -2.00 -1.0d 0.00 1.00 2100 3.00 4.00 
THETA s 2 3 
v v Vv 
OOO = =—r Cd 
Strongly disagree Disagree Agree Strongly agree 


9 This approach was also used in the IEA ICCS 2009 and 2016 surveys (see Schulz and Friedman 2011, 2018) and the 
ICILS 2013 survey (see Schulz and Friedman 2015). 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 169 


Summed probabilities are not identical to expected item scores and have to be understood in 
terms of the probability of scoring at least a particular category.’° Thurstonian thresholds can be 
used to indicate for each item category those points on a scale at which respondents have a 0.5 
probability of scoring in this category or higher. For example, in the case of Likert-type items with 
the categories strongly disagree, disagree, agree and strongly agree, we can determine at what point of 
ascale a respondent has a 50 percent likelihood of agreeing (or strongly agreeing) with the item. 


The item-by-score maps included in ICILS 2018 reports predict the minimum coded score (e.g., 
O = “strongly disagree,’ 1 = “disagree,” 2 = “agree,” and 3 = “strongly agree”) a respondent would 
obtain ona Likert-type item. For example, we could predict that students with a certain scale score 
would have a 50 percent probability of agreeing (or strongly agreeing) with a particular item (see 
the example item-by-score map in Figure 12.2). For each item, it is thus possible to determine 
Thurstonian thresholds: the points at which a minimum item score becomes more likely than any 
lower score to occur and which determine the boundaries between item categories on the item- 
by-score map. 


Figure 12.2: Example of questionnaire item-by-score map 


Scale scores (mean = 50, standard deviation = 10) 
Scores 


oe 20 30 40 50 60 70 80 


Item #1 


Item #3 


Strongly disagree Disagree BB Agree I Strongly agree 


Example of how to interpret the item-by-score map 

#1: Arespondent with score 30 has more than 50% probability to strongly disagree with all three 
items 

#2: Arespondent with score 40 has more than 50% probability not to strongly disagree with 
items 1 and 2 but to strongly disagree with item 3 

#3: Arespondent with score 50 has more than 50% probability to agree with items 1 and to 
disagree with items 2 and 3 

#4: Arespondent with score 60 has more than 50% probability to strongly agree with items 1 
and to at least agree with items 2 and 3 


#5: Arespondent with score 70 has more than 50% probability to strongly agree with items 1, 2, 
and 3 


10 Other ways of describing item characteristics based on the partial credit model are item characteristic curves, which 
involve plotting the individual category probabilities and the expected item score curves (for a detailed description, see 
Masters and Wright 1997). 


170 ICILS 2018 TECHNICAL REPORT 


This information can also be summarized by calculating the average thresholds across all items in 
ascale. For example, it is possible to do this for the second threshold of a four-point Likert-type 
scale, which allows the prediction of how likely it would be for a respondent with a certain scale 
score to have responses in the two lower or upper categories (on average across items). For ICILS 
2018 we used this approach in the case of iterms measuring agreement to distinguish between 
scale scores for respondents who were most likely to agree or disagree with the “average item” 
used for measuring the respective latent trait. 


Inthe reporting tables for questionnaire scales, we depicted national average student or teacher 
scale scores as boxes that indicated their mean values plus/minus sampling error and that were 
set in graphical displays featuring two underlying colors. National average scores located in the 
darker shaded area indicated that, on average across items, student or teacher responses had 
resided in the lower item categories (“disagree or strongly disagree” or “less than once a week’). If 
these scores were found in the lighter shaded area, however, then students’ or teachers’ average 
item responses would have been in the upper item response categories (“agree or strongly agree” 
or “at least once a week’). 


+ 


Scaled indices 


Student questionnaire 

National index of students’ socioeconomic background 

The multivariate analyses presented in the international report (Fraillon et al. 2020) include a 
composite index reflecting students’ socioeconomic background. The national index of students’ 
socioeconomic background (S_NISB) was derived from the following three indices: highest 
occupational status of parents (S_HISEI), highest educational level of parents (S_HISCED), and 
the number of books at home (S_HOMLIT). For the S_HISCED index, we collapsed the lowest two 
categories to have an indicator variable with four categories: lower-secondary or below, upper- 
secondary, tertiary non-university, and university education. The S HOMLIT index was reduced 
from five to four categories (O to 10 books; 11 to 25 books; 26 to 100 books; more than 100 books 
collapsing the two highest categories. This was done for both indices on parental education and 
home literacy as prior analyses had shown approximately linear associations across these categories 
with CIL and CT scores as well as other indicators of socioeconomic background. 


In order to impute values for students who had missing data for one of the three indicators, we 
used predicted values plus arandom component based on aregression on the other two variables 
that had been estimated for students with values on all three variables. This imputation procedure 
was Carried out for each national sample separately. 


After converting the resulting variables including the imputed values into z-standardized variables 
(with a mean of O and a standard deviation of 1 for each national dataset), principal component 
analysis of these indicator variables were conducted separately for each weighted national sample. 


The final S NISB scores consists of factor scores for the first principal component with national 
averages of O and national standard deviations of 1. Table 12.2 shows the factor loadings and 
reliabilities for each national sample. 


Students’ use of ICT for different activities 

Question 19 of the student questionnaire asked students to indicate their use of ICT for a range 
of different activities. For each of the eight activities, they were asked to select “never, “less than 
once amonth,’ “at least once amonth but not every week,” “at least once a week but not every day,’ 
or “every day.’ The items in this question were used to derive two scales. The first one reflected 
students’ use of general applications for activities (S_GENACT) as based on the first three items of the 
question and had an average reliability of 0.70 across the national samples, with Cronbach's alpha 
coefficients ranging from 0.63 to 0.84 (see Table 12.3). The second scale reflected students’ use of 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 171 


Table 12.2: Factor loadings and reliabilities for the national index of students’ socioeconomic background 


Country Factor loadings for: Cronbach’s alpha 
Highest Highest Books 
parental parental at home 
occupation education 
Chile 0.90 0.88 0.70 0.77 
Denmark 0.79 0.79 0.72 0.65 
Finland 0.80 0.76 0.56 0.52 
France 0.80 0.78 0.69 0.63 
Germany 0.82 0.72 0.66 0.61 
taly 0.82 0.82 0.72 0.70 
azakhstan 0.78 0.78 0.57 0.51 
orea, Republic of 0.72 0.75 0.64 0.51 
Luxembourg 0.81 0.79 0.75 0.67 
Moscow (Russian Federation) 0.76 0.78 0.64 0.55 
North Rhine-Westphalia (Germany) 0.81 0.68 0.69 0.59 
Portugal 0.84 0.86 0.74 0.74 
United States 0.80 0.83 0.68 0.66 
Uruguay 0.82 0.82 0.75 0.71 
ICILS 2018 average 0.80 0.78 0.67 0.62 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


specialist applications for activities (Ss SPECACT). It was derived from the remaining five items in the 
question with reliabilities ranging from 0.65 to 0.82, with an average of 0.73. The higher values on 
each scale reflect more frequent use of ICT for the corresponding activities. 


Figure 12.3 depicts the results from the confirmatory factor analysis of the scaled items from the 
question. The model had satisfactory fit for the two-factor model, and we found a high positive 
correlation between the two latent factors. When reviewing measurement invariance using 
multiple-group models with different constraints, the model fit changed only marginally which 
indicates a relatively high degree of invariance for this model. The reliabilities were satisfactory 
for all countries. The item parameters for both scales that were used to derive the IRT scale scores 
are presented in Table 12.4. 


172 


ICILS 2018 TECHNICAL REPORT 


Figure 12.3: Confirmatory factor analysis of items measuring students’ use of ICT for activities 


S_GENACT 76 
78 


S_SPECACT 


69 Ss 19f 
67 
74 = Gilets 


Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.060 0.077 0.096 0.091 
CFI 0.98 0:97 0.93 0.89 
TLI 0.96 0.95 0.92 0.93 


Table 12.3: Reliabilities for scales measuring students’ participation in out-of-school activities 


Country Scale reliability (Cronbach's alpha) 
S_GENACT S_SPECACT 
Chile 0.68 0.70 
Denmark 0.70 0.74 
Finland 0.71 0.73 
France 0.65 0.71 
Germany 0.69 0.70 
taly 0.63 0.71 
azakhstan 0.75 0.78 
orea, Republic of 0.84 0.82 
Luxembourg 0.68 0.75 
Moscow (Russian Federation) 0.72 0.75 
North Rhine-Westphalia (Germany) 0.69 0.65 
Portugal 0.69 0.73 
United States 0.67 0.70 
Uruguay 0.71 0.70 
ICILS 2018 average 0.70 0.73 


Notes: Benchmarking participants in italics. T 
excluding benchmarking participants. 


eICILS 2018 average is based ond 


ata from the participating countries, 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 


Table 12.4: Item parameters for scales measuring students’ use of ICT for activities 


173 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
S GENACT How often do you use ICT for each of the following activities? 
S2G19A Write or edit documents -0.42 -1.89 -0.57 0.34 212 
S2G19B Use a spreadsheet to do calculations, store data 0.33 -1.61 -0.46 0.00 2.08 
or plot graphs (e.g. using [Microsoft Excel ®]) 
$2G19C Create a simple “slideshow” presentation 0.10 -2.61 -0.71 0.67 2.65 
e.g. using [Microsoft PowerPoint ®]) 
SISPEGAGi How often do you use ICT for each of the following activities? 
$2G19D Record or edit videos -0.44 -0.78 -0.17 -0.09 1.03 
S2G19E Write computer programs, scripts or apps 0.29 -0.49 -0.22 -0.19 0.90 
e.g. using [Logo, LUA, or Scratch]) 
S2G19F Use drawing, painting or graphics software -0.18 -0.64 -0.05 -0.03 0.71 
or [apps] 
S2G19G Produce or edit music -0.14 0.34 -0.13 -0.22 0.01 
S2G19H Build or edit a webpage 0.46 0:16 -0.20 -0.26 0.30 


Students’ use of ICT for communication activities 


In Question 20, respondents were asked to indicate the frequency with which they use ICT for 


10 different communication 
than once a month; “at least once a month but not every week,’ “at least on 


nu 


day,’ or “every day.’ Two sca 
use of ICT for social communication (S_USECOM) while the second reflected students’ use of ICT for 


exchanging information (S_US 
use of ICT for th 


ese purposes. 


ne 


es were derived from this item set. The firs 


EINF). Higher scores on both of these scales i 


Figure 12.4 depicts the results from the confirmatory factor analysis o 


the question 


. There was onl 


a moderate correlation between the two latent factors (r=0.58). When 
i tiple-group models with different constraints, the model 1 


invariance using across mul 


satisfactory 


respectively 


for the most co 


S_USEINF. T 


activities. For each activity, they were asked to select “never; 
ce aweek but not every 
t set reflected students’ 


y marginally satisfactory fit for the two-factor model and 
reviewing measurement 
At was not 
invariance 
lable 12.5. 
g countries (0.78 and 0.75 


nstrained models, which suggests a lack of measurement 
across countries. The national scale reliabilities for the two scales are presented in 1 
Both scales had an acceptable average reliability across participatin 
, with national reliabilities ranging 0.72 to 0.82 for S USECOM and 0.72 to 0.80 for 


he item parameters for the two scales are presented in Table 12.6. 


f the scaled i 


DG 


less 


ndicated more frequent 


tems from 
there was 


174 ICILS 2018 TECHNICAL REPORT 


Figure 12.4: Confirmatory factor analysis of items measuring students’ use of ICT for communication 
activities 


a uae 

66 ee q20b 
56 

oe q20c 
50 

CGausecom) 76 ————~»> q20d 
70 


> q20h 
| ee q20i 

58 a oe 
q20} 


ees q20e 
88 
> 
a 


q20f 


S_USEINF 89 
70 


q20g 


Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.090 0.107 0.109 0.108 
CFI 0.93 0.92 0.90 0.84 
TL 0.91 0.90 0.89 0.89 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 175 


Table 12.5: Reliabilities for scales measuring students’ use of ICT for communication activities 
Country Scale reliability (Cronbach's alpha) 
S_ USECOM S_USEINF 
Chile 0.80 0.73 
Denmark 0.73 0.75 
Finland 0.74 0.73 
France 0.80 0.73 
Germany 0.73 0.76 
taly 0.77 0.74 
azakhstan 0.82 0.77 
orea, Republic of 0.81 0.80 
Luxembourg 0.76 0.80 
Moscow (Russian Federation) 0.76 0.75 
North Rhine-Westphalia (Germany) 0.72 0.73 
Portugal 0.76 0.76 
United States 0.82 0.76 
Uruguay 0.78 0.72 
ICILS 2018 average 0.77 O75 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.6: Item parameters for scales measuring students’ use of ICT for communication activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
S USECOM How often do you use ICT to do each of the following communication activities? 
S2G20A Share news about current events on social media 0.51 -0.14 0.10 -0.30 0.34 
S2G20B Communicate with friends, family, or other -0.78 0.06 0.34 -0.10 -0.30 
people using instant messaging, voice or video 
chat (e.g. [Skype, WhatsApp, Viber]) 
$2G20C Send texts or instant messages to friends, family, -0.71 0.15 0.38 -0.15 -0.38 
or other people 
S2G20D Write posts and updates about what happens in 0.79 -0.09 -0.13 -0.26 0.48 
your life on social media 
$2G20H Post images or video in social networks or online 0.44 -0.49 -0.21 0.04 0.67 
communities (e.g. [Facebook, Instagram or 
YouTube]) 
S$2G20I Watch videos or images that other people have -0.78 0.12 0.34 -0.12 -0.34 
posted online 
S$2G20) Send or forward information about events or 0.52 -0.37 -0.20 -0.09 0.66 
activities to other people 
S USEINF How often do you use ICT to do each of the following communication activities? 
S2G20E Ask questions on forums or [Q&A, question -0.11 -0.55 -0.49 0.08 0.97 
and answer] websites 
S2G20F Answer other peoples’ questions on forums or -0.10 -0.37 -0.43 0.03 0.77 
[Q&A, question and answer] websites 
S2G20G Write posts for your own blog 0.241 0.34 -0.49 -0.28 0.43 


(e.g. (WordPress, Tumblr, Blogger]) 


176 


eisure activities (“never,; 
east once a week but not every day,’ or “every day” 
to derive a scale of students' use of ICT for accessing 


me 


ICILS 2018 


Students’ use of ICT for accessing content from the internet 
Students are asked in Question 21 to indicate the 
less than once amonth,’ 


frequency with which they 
“at least once a month but not every week,’ “at 
. The first five items of the question were used 
content from the in 
scores on the scale correspond to more frequent use of ICT for leisure activities. A confirmatory 
factor analysis assuming a one-dimensiona 
satisfactory fit (see Figure 12.5). A review of measurement invariance indica 
in model fit across multiple-group models with different constraints that suggests considerable 
variation in measurement characteristics. T 
satisfactory for all countries with an avera 
parameters for the scale are presented in Ta 


model for the five items reveal 


he reliabilities (Cronbach's alpha 
ge of 0.76 across coun 
ble 12.8. 


ternet (S ACCO 


TECHNICAL REPORT 


use ICT for a list of 


Dag 


T). Higher 


ed only marginally 
ted some variation 


of the scale were 


tries (Table 12.7). The item 


Figure 12.5: Confirmatory factor analysis of items measuring students’ use of ICT for leisure activities 


S ACCONT 


ees uer 
64 

ee 
71 ae 
62 > q21c 
74 

oo? 21d 
64 : 

ie ae 


Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.098 0.102 0.079 0.094 
CFI 0.97 0.97 0.97 0.90 
TLI 0.95 0.95 0.97 0.95 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 177 


Table 12.7: Reliabilities for scale measuring students’ use of ICT for leisure activities 


Country Scale reliability (Cronbach's alpha) 
S ACCONT 
Chile 0.76 
Denmark 0.73 
Finland 0.78 
France 0.76 
Germany 0.72 
taly 0.73 
azakhstan 0.79 
orea, Republic of 0.83 
Luxembourg 0.74 
Moscow (Russian Federation) 0.68 
North Rhine-Westphalia (Germany) 0.73 
Portugal 0.74 
United States 0.79 
Uruguay 0.76 
ICILS 2018 average 0.75 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


Table 12.8: Item parameters for scale measuring students’ use of ICT for leisure activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
SIAGEONT How often do you use ICT to do each of the following leisure activities? 
S2G21A Search the Internet to find information about 0.29 -1.26 -0.25 0.17 1.34 
places to go or activities to do 
$2G21B Read reviews on the Internet of things you 0.30 -0.82 -0.41 0.05 spel 
might want to buy 
S2G21C Read news stories on the Internet 0.00 -0.57 -0.09 -0.03 0.69 
$2G21D Search for online information about things you -0.60 -0.66 -0.31 -0.09 1.06 
are interested in 
S2G21E Use websites, forums, or online videos to find 0.01 -0.65 -0.46 -0.04 15 
out how to do something 


178 


Students’ use of ICT for study purposes 


ICILS 2018 TECHNICAL REPORT 


In Question 22, students were asked to indicate the frequency that they use ICT for 10 items on 


na 


different school-related purposes. For each item, they were asked to select “never, “less than once 


nu 


a month,’ “at least once a month but not every week; 
day,’ or “every school day.’ All items in the question were used 
escores reflect more frequent use of 


ICT for study purposes 


results fro 


Overall, th 
compariso 
model tha 


e model sh 


t suggest t 


(Cronbach's alpha) o 


of 0.83, ra 


maconfirmatory fac 


S_USESTD),where higher sca 


owed only marginally satisfac 


ns show that there were differences in model fit betwe 


here was some variation in m 


nging between 0.77 and 0.89 across cou 


presented in Table 12.10. 


nt 


tor analysis for the model are presented in Figure 12.6. 


tory fit, and 


easuremen 


at least once a week but not every school 
to derive a scale of students' use of 


CT. The 


reviews of multiple-group model 
enthe metric and scalar invariance 
t characteristics. The reliabilities 


f the scale are presented in Table 12.9 and show a high average reliability 


ntries. The 


item parameters for the scale are 


Figure 12.6: Confirmatory factor analysis of items measuring students’ use of ICT for study purposes 


q22a 
a %: 
ZX 
q22b 
65 i 
62 ee q22c 
Le _ q22d 
78 
75. ————————__>> q22e 
SaUSESTD 
72 ——_____. q22¢ 
69 
ee te q22g 
. eg ree 
54 oe 
a q22i 
q22) 
Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.072 0.086 0.097 0.108 
CFI 0.97 0.96 O93 0.86 
TLI 0.96 0.94 0.92 0.91 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 179 


Table 12.9: Reliabilities for scale measuring students’ use of ICT for study purposes 


Country Scale reliability (Cronbach's alpha) 
S_USESTD 
Chile 0.82 
Denmark 0.77 
Finland 0.86 
France 0.81 
Germany 0.82 
taly 0.80 
azakhstan 0.87 
orea, Republic of 0.89 
Luxembourg 0.84 
Moscow (Russian Federation) 0.82 
North Rhine-Westphalia (Germany) 0.81 
Portugal 0.84 
United States 0.84 
Uruguay 0.84 
ICILS 2018 average 0.83 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


Table 12.10: Item parameters for scale measuring students’ use of ICT for study purposes 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
SAUGESTID) How often do you use ICT for the following school-related purposes? 
S2G22A Prepare reports or essays -0.08 -1.38 -0.51 0.23 1.66 
$2G22B Prepare presentations -0.06 -1.87 -0.55 0.49 1.93 
$2G22C Work online with other students 0.08 -0.46 -0.26 -0.11 0.83 
$2G22D Complete [worksheets] or exercises -0.09 -0.60 -0.22 -O0.11 0.94 
$2G22E Organize your time and work -0.03 -0.10 -0.24 -0.13 0.47 
$2G22F Take tests 0.28 -0.71 -0.48 -0.07 1.27 
S2G22G Use software or applications to learn skills or 0.11 -0.54 -0.40 -0.08 1.03 
a subject (e.g. mathematics tutoring software, 
anguage learning software) 
$2G22H Use the Internet to do research -112 -0.92 -0.36 0.20 1.08 
S$2G22! Use coding software to complete assignments 0.57 -0.25 -0.46 -0.15 0.86 
e.g. [Scratch]) 
S$2G22) ake video or audio productions 0.33 -0.17 -0.05 -0.12 0.34 


180 


Students’ use of ICT for class activities 


ICILS 2018 TECHNICAL REPORT 


Question 24 required students to indicate how often they use different |CT-related tools during 


class (selecting from “never; 


Doe 


In some 


lesson’). Three of the items were used 


class (S_GENCLASS) and six of the item 


essons, 


nue 


in most lessons,” or “in every or almost every 
to derive a scale on students’ use of general applications in 
s were used to derive a scale of students’ use of specialist 


applications in class (S_SPECLASS). Higher scale scores reflect more frequent use of the respective 


type of ICT applications for class activiti 


Aconfirmatory factor analysis usi 


es, 


satisfactory fit for a two-dimensional model with a strong positive correlation 


latent 


models with different constraints, wh 


invariance. The national reliabiliti 
had acceptable reliabilities (Cron 


factors (0.66). The fit of the model was also acceptable when compari 
ich suggests a relatively high degree 
es for the two scales are presented in Table 12.11. Both scales 
bach’s alpha) with an average of 0.72 and 0.84 across countries 


ng all scaled items from this question (see Figure 12.7) revealed 


between the two 
ng multiple-group 
of measurement 


respectively (ranging from 0.53 to 0.81 for S GENCLASS and ranging from 0.78 to 0.92 for 


S_SPECLASS). The item parameters for both scales are displayed in Table 12.12. 


Figure 12.7: Confirmatory factor analysis of items measuring students’ report on the 


activities 


se of ICT for class 


66 


84 


S_GENCLASS 


73 


S_SPECLASS 


ee q24b 
> q24c 
—> q24i 
q24e 

nee 
ee q24¢ 
————— 24g 
ee q24h 
ie 424) 
Os q24k 


Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.075 0.087 0.096 0.085 
CFI 0.97 0.97 0.96 0.96 
TLI 0.97 0.96 0.96 0.97 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 181 


Table 12.11: Reliabilities for scales measuring students’ reports on the use of ICT for class activities 


Country Scale reliability (Cronbach’s alpha) 
S_GENCLASS S_SPECLASS 
Chile 0.76 0.84 
Denmark 0.53 0.82 
Finland 0.70 0.78 
France 0.73 0.78 
Germany 0.67 0.82 
taly 0.74 0.82 
azakhstan 0.76 0.87 
orea, Republic of 0.81 0.92 
Luxembourg 0.76 0.89 
Moscow (Russian Federation) 0.73 0.82 
North Rhine-Westphalia (Germany) 0.69 0.84 
Portugal 0.78 0.86 
United States 0.70 0.85 
Uruguay 0.72 0.86 
ICILS 2018 average OZ 0.84 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.12: Item parameters for scales measuring students’ reports on the use of ICT for class activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
S GENCLASS — When studying throughout this school year, how often did you use the following tools during class? 
S$2G24B Word-processing software (e.g. [Microsoft Word ®]) -0.14 -2.51 0.61 1.90 
S2G24C Presentation software (e.g. [Microsoft PowerPoint ®]) 0.03 -2.98 0.53 2.45 
S2G24 Computer-based information resources (e.g. websites, 0.12 -2.26 Osl7 2.08 


wikis, encyclopaedia) 


S SPECLASS When studying throughout this school year, how often did you use the following tools during class? 

S2G24E ultimedia production tools (e.g. media capture and -0.07 -1.86 0.39 1.47 
editing, web production) 

S2G24F Concept mapping software (e.g. [Inspiration ®], 0.37 -1.67 0.19 1.48 
Webspiration ®)]) 

S2G24G Tools that capture real-world data (e.g. speed, 0.16 -1.82 0.33 1.49 
temperature) digitally for analysis 

$2G24H Simulations and modelling software 0.35 -1.68 0.26 1.42 

$2G24) nteractive digital learning resources (e.g. learning -0.48 -2.06 0.45 1.61 
games or applications) 

S$2G24K Graphing or drawing software -0.33 -1.82 0.31 11 


182 


ICILS 2018 TECHNICAL REPORT 


Students’ perceptions of school learning of ICT and coding tasks 


In Question 25, students were asked to indicate the extent 


different tasks at school that are related to usin 


g ICT. They 


options for each task (“to a large extent,’ “to a moderate exte 
learning of ICT tasks at school (S_ICT 


The items were used to derive ascale of students 
scale scores corresponded to higher perceived 


earning of d 


to which they had learn 
had to select between f 


ne 


ifferent tasks involving 


Question 29 required students to indicate the extent to which they had been taug 


t about eight 
our different 


nt? “to. asmall extent,’ or “not at all”). 


LRN). Higher 
CT, 


ht how to do 


tasks that are related to coding (selecting from the same response options as in Question 25). 


The nine items in the scale were used to derive 
school (S_CODLRN). 


ascale of st 


dents' learning of ICT coding tasks at 


Figure 12.8 shows the results from a confirmatory factor analysis of all scaled iterns measuring 


students’ perceptions of school learning of ICT and coding tasks. The model fit was satisfactory 
and amoderate correlation was observed between the two latent factors (0.51). When reviewing 
measurement invariance using multiple-group models with different constraints, the model fit 
changed only marginally which indicates a relatively high degree of invariance for this model. 


Figure 12.8: Confirmatory factor analysis of items measuring students’ perceptions of school learning of ICT 


and coding tasks 


78 

74 

83 

80 

AP 
sou 


q25a 


q25b 


q25c 


q25d 


q25e 


q25f 


q25g 


q24h 


q29a 


q29b 


q29c 


q29d 


q29e 


q29F 


q29g 


q29h 


q2gi 


Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.063 0.071 N/A 0.078 
CFI 0.97 0.97 N/A 0.94 
TL 0.96 0.96 N/A 0.95 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 


Table 12.13 shows the scale reliabilities (Cronbach's alpha) for the two scales reflecting the extent 
of learning of ICT and coding tasks. The reliabilities were satisfactory for all countries. The item 
parameters for both scales that were used to derive the IRT scale scores are presented in Table 
12.14.S_ICTLRN had a high level of reliability (Cronbach's alpha) with an average reliability of 0.88 


across countries (ranging from 0.83 


to 0.94), and S CODLRN also had a high average reliability 


(Cronbach's alpha = 0.90) across countries (ranging from 0.85 to 0.96). 


Table 12.13: Reliabilities for scales measuring students’ perceptions of school learning of ICT and coding tasks 


Country Scale reliability (Cronbach's alpha) 
S_ICTLRN S_CODLRN 
Chile 0.86 0.90 
Denmark 0.83 0.85 
Finland 0.92 0.91 
France 0.84 0.88 
Germany 0.86 0.89 
taly 0.87 0.88 
azakhstan 0.89 0.90 
orea, Republic of 0.94 0.96 
Luxembourg 0.88 0.91 
Moscow (Russian Federation) 0.94 0.91 
North Rhine-Westphalia (Germany) 0.87 0.89 
Portugal 0.91 0.92 
United States 0.89 0.90 
Uruguay 0.87 0.90 
ICILS 2018 average 0.88 0.90 


Notes: Benchmarking participants in italics. T 
excluding benchmarking participants. 


eICILS 2018 average is based on data from the participating countries, 


183 


184 


ICILS 2018 TECHNICAL REPORT 


Table 12.14: Item parameters for scales reports on the use of ICT for class activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
S ICTLRN At school, to what extent have you learned how to do the following tasks? 
S2G25A Provide references to Internet sources -0.04 =. 57 -0.32 1.88 
S2G25B Search for information using ICT -0.38 -1.26 -0.34 1.60 
S$2G25C Present information for a given audience or purpose 0.10 -1.35 -0.38 1.73 
using ICT 
$2G25D Work out whether to trust information from the Internet 0.08 -1.36 -0.30 1.66 
$2G25E Decide what information obtained from the Internet is -0.03 -1.38 -0.37 1.75 
relevant to include in school work 
S2G25F Organize information obtained from Internet sources -0.04 -1.50 -0.33 1.82 
S2G25G Decide where to look for information on the Internet -0.01 -1.32 -0.32 1.64 
about an unfamiliar topic 
S$2G25H Use ICT to collaborate with others 0.31 -1.12 -0.27 1.39 
S CODLRN When studying during the current school year, to what extent have you been taught how to do the following tasks? 
S2G29A To display information in different ways -0.89 -1.59 -O.49 2.08 
S2G29B To break a complex process into smaller parts 0.20 -1.56 -0.51 2.06 
$2G29C To understand diagrams that describe or show real- -0.30 -1.64 -0.29 1.93 
world problems 
$2G29D To plan tasks by setting out the steps needed to -0.27 -1.57 -0.35 1.92 
complete them 
S2G29E To use tools to make diagrams that help solve problems 0.01 -1.52 -0.30 1.82 
S2G29F To use simulations to help understand or solve real 0.57 -1.42 -0.39 1.84 
world problems 
$2G29G To make flow diagrams to show the different parts of 0.68 -1.30 -0.32 1.62 
a process 
$2G29H To record and evaluate data to understand and solve 0.01 -1.60 -0.31 1.91 
a problem 
$2G29| To use real-world data to review and revise solutions 0.00 -1.45 -0.35 1.80 


to problems 


Students’ self-efficacy 


Question 27 of the ICILS 2018 student questionnaire asked respondents to indicate how well 
they could do aseries of tasks when using ICT (response categories were “| know how to do this,’ 
“| have never done this but | could work out how to do this,’ and “| do not think | could do this’). 


Twelve of the thirteen items from this question provided 


data for deriving two scales: students’ 


self-efficacy regarding the use of general applications (S_GENEFF) and students' self-efficacy regarding 


the use of specialist applications (S_SPECEFF). 


Figure 12.9 i 
between the two latent factors (0.49). W 
group mode 
satisfactory with more constrained mode 
S_GENEFF was derived 
with the coeff 
S_SPECEFF was based on four items and 
ranging from 0.68 to 0.79. The higher va 
self-efficacy. The item parameters for the 


lustrates the results of the confirmatory factor analysis assuming a two-d 
model with items from the scale. The model fi 


from eight items and 
ficients ranging from 0.76 to 0.92 across participati 


t was very good, and there was amoderate 


s suggesting a high degree of measurement 


imensional 
correlation 


hen reviewing measurement invariance using multiple- 
s with different constraints, the results showed that model fit remained equally 


invariance. 


had an average reliability (Cronbach's alp 


ues on these two scales represent a greate 
two scales used for scali 


ha) of 0.83, 


ng countries (see Table 12.15). 
had an average scale reliability of 0.74, with coefficients 


r degree of 


ng are recorded in Table 12.16. 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 


Figure 12.9: Confirmatory factor analysis of items measuring students’ ICT self-efficacy 


S_GENEFF E 


SISPECERE 


q27a 


q27c 


q27d 


q27 


q27) 


q27k 


q27 


q27m 


q27b 


q27e 


q27g 


q27h 


Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.075 0.082 0.079 0.082 
CFI 0.94 0.94 0.935 0.92 
TL 0.92 0.93 0.931 0.93 


185 


186 


ICILS 2018 TECHNICAL REPORT 


Table 12.15: Reliabilities for scales measuring students’ ICT self-efficacy 


Country Scale reliability (Cronbach’s alpha) 
S_GENEFF S_SPECEFF 
Chile 0.81 0.73 
Denmark 0.76 0.75 
Finland 0.82 0.73 
France 0.80 0.68 
Germany 0.83 0.75 
taly 0.77 0.72 
azakhstan 0.92 0.74 
orea, Republic of 0.89 0.79 
Luxembourg 0.85 0.74 
Moscow (Russian Federation) 0.89 0.71 
North Rhine-Westphalia (Germany) 0.81 0.72 
Portugal 0.81 0.73 
United States 0.84 0.76 
Uruguay 0.83 0.73 
ICILS 2018 average 0.83 0.73 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.16: Item parameters for scales measuring students’ ICT self-efficacy 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
S GENEFF How well can you do each of these tasks when using ICT? 
S$2G27A Edit digital photographs or other graphic images 0.17 -0.57 0.57 
$2G27C Write or edit text for a school assignment -0.28 -0.16 0.16 
$2G27D Search for and find relevant information for a school -0.41 -0.16 0.16 
project on the Interne 
S2G27 Create a multi-media presentation (with sound, 0.65 -0.78 0.78 
pictures, or video) 
S2G275 Upload text, images, or video to an online profile 0.06 -0.44 0.44 
S2G27 nsert an image into adocument or message -0.29 -0.12 0.12 
$2G27L nstall a program or [app] -0.20 -0.07 0.07 
S2G27 Judge whether you can trust information you find on 0.30 -0.71 0.71 
the Internet 
S SPECERE How well can you do each of these tasks when using ICT? 
$2G27B Create a database (e.g. using [Microsoft Access ®]) -0.06 -1.15 1.15 
S2G27E Build or edit a webpage -0.28 -1.17 1.17 
$2G27G Create a computer program, macro, or [app] 0.62 -1.04 1.04 
e.g. in [Basic, Visual Basic]) 
S2G27H Set up a local area network of computers or other ICT -0.28 -0.69 0.69 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 187 


Students’ perceptions of ICT 


In Question 28 of the student questionnaire, respondents were asked provide their level of 


ne me ne 


agreement (“strongly agree,’ “agree,” “disagree, “strongly disagree’) to aseries of statements about 
ICT. The following scales were derived from the 114 items in the question: 


e St 
e St 
e St 


dents' perceptions of positive outcomes of ICT for society (S_ICTPOS) 
dents' perceptions of negative outcomes of ICT for society (S_ICTNEG) 
se for work and study (S_ICTFUT) 


dents' expectations of future [CT 


Higher scores on these scales corresponded to stronger views (whether they be positive views, 
negative views, or greater expectations). A confirmatory factor analysis of the items in the question 


was run (see Figure 12.10), and supported a three-dimensional model. The model had a good fit, 
even when constraints were placed, which broadly suggests a relatively high level of measurement 
invariance. There was a strong positive correlation between S_ICTPOS and S_ICTFUT latent 
factors, but S_ICTNEG had only weak correlations with the other two scales. 
Figure 12.10: Confirmatory factor analysis of items measuring students’ perceptions of ICT 
ee q28a 
J4 
a q28b 
S ICTPOS 
73 —— ye q28t 
72. 
05 
es q28c 
64 
| oe a = 
S ICTNEG 
70 ely: 
q28e 
65 
On 28h 
10 
__ q28i 
82 
CGiereur 85 > q28} 
78 
i. 428k 
Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.042 0.063 0.064 0.075 
CFI 0.98 0.97 0.96 0.92 
TLI 0.98 0.96 0.95 0.94 


188 


The average reliability (C 


ICILS 2018 TECHNICAL REPORT 


ronbach’s alpha) across countries for S_ ICTPOS was 0.75 (ranging from 


0.76 to 0.85), for S_ICTNE 


EG it was 0.66 (ranging from 0.62 to 0.72), andfor S_ICTFUT it was 0.80 


(ranging from 0.74 to 0.85) (see Table 12.17). Item parameters for the three scales are presented 


in Table 12.18. 


Table 12.17: Reliabilities for scales measuring students’ perceptions of ICT 


Country Scale reliability (Cronbach's alpha) 
S_ICTPOS S_ICTNEG SaleneWig 
Chile 0.75 0.65 0.80 
Denmark 0.67 0.63 0.84 
Finland 0.81 0.67 0.83 
France 0.73 0.68 0.82 
Germany 0.72 0.65 0.82 
taly 0.73 0.63 0.74 
azakhstan 0.79 0.68 0.79 
orea, Republic of 0.85 0.72 0.84 

Luxembourg 0.73 0.64 0.80 
Moscow (Russian Federation) 0.78 0.68 0.80 
North Rhine-Westphalia (Germany) 0.70 0.62 0.85 
Portugal 0.74 0.66 0.75 
United States 0.75 0.67 0.81 
Uruguay 0.77 0.68 0.75 
ICILS 2018 average 0.75 0.66 0.81 

Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 

excluding benchmarking participants. 

Table 12.18: Item parameters for scales measuring students’ perceptions of |CT 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
Sn Gie@s How much do you agree or disagree with the following statements about ICT? 
S2G28A Advances in ICT usually improve people's living 0.01 -1.95 -0.79 2.75 
conditions. 
S2G28B CT helps us to understand the world better. -O0.11 -2.03 -0.70 273 
S2G28F CT is valuable to society. 0.02 -1.98 -0.64 2.62 
$2G28G Advances in ICT bring many social benefits. 0.08 -2.06 -0.58 2.64 
S ICTNEG How much do you agree or disagree with the following statements about ICT? 
S$2G28C Using ICT makes people more isolated in society. 0.00 -1.65 0.03 1.62 
$2G28D With more ICT there will be fewer jobs. 0.47 -1.62 0.22 L.40 
S2G28E People spend far too much time using ICT. -0.48 -1.18 -0.27 1.45 
S$2G28H Using ICT may be dangerous for people's health. 0.02 -1.33 -0.27 59 
SCHACHT How much do you agree or disagree with the following statements about ICT? 
$2G28 would like to study subjects related to ICT after 0.34 -1.90 -0.07 97 
secondary school] 
$2628) hope to find a job that involves advanced ICT 0.24 -1.96 -0.06 2.02 
S2G28 Learning how to use ICT applications will help me to -0.57 -1.79 -0.40 2.19 
do the work | am interested in 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 189 


Teacher questionnaire 

Teachers’ ICT self-efficacy 

Question 7 of the ICILS 2018 teacher questionnaire asked respondents to indicate how well they 
could do nine different school-related tasks using ICT. They were asked to select “I know how to do 
this” “| haven't done this but | could find out how,’ or “I do not think | could do this.’ All items were 
used to derive ascale on teachers’ ICT self-efficacy (T_ICTEFF). Higher scale scores corresponded 
to a greater degree of teacher self-efficacy in using ICT for these tasks. 


Aconfirmatory factor analysis (see Figure 12.11) reveals the model was highly satisfactory. When 
comparing the configural and scalar multiple-group model,!* the model is somewhat less satisfactory 
suggesting some variation in measurement properties for this model. The items in the scale were 
shown to be highly reliable. The average reliability (Cronbach's alpha) across countries was 0.81 
(ranging from 0.73 to 0.82) (see Table 12.19). The item parameters for the scale are presented 
in Table 12.20. 


Figure 12.11: Confirmatory factor analysis of items measuring teachers’ ICT self-efficacy 


qO7a 
qO7b 
92 
65 ns qO7c 
80 
> qO7d 
vat 
TW KCHIERF I 81 > qO7e 
70 
> qO7f 
75 
72 wien 07g 
67 a eee 
ia qO7h 
qO7i 
Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.044 0.041 N/A 0.071 
CFI 0.98 0.99 N/A 0.94 
Tu 0.97 0.98 N/A 0.95 


11 Please note that there was no convergence for the metric model. 


190 ICILS 2018 TECHNICAL REPORT 


Table 12.19: Reliabilities for scale measuring teachers’ ICT self-efficacy 


Country Scale reliability (Cronbach's alpha) 
T_ICTEFF 
Chile 0.74 
Denmark 0.73 
Finland 0.79 
France 0.80 
Germany 0.80 
taly 0.82 
azakhstan 0.90 
orea, Republic of 0.84 
Luxembourg 0.78 
Moscow (Russian Federation) 0.77 
North Rhine-Westphalia (Germany) 0.80 
Portugal 0.79 
United States 0.88 
Uruguay 0.82 
ICILS 2018 average 0.80 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.20: Item parameters for scale measuring teachers’ ICT self-efficacy 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
ILICHERE How well can you do these tasks using ICT? 
T2GO7A Find useful teaching resources on the Internet -1.18 0.70 -0.70 
T2G07B Contribute to a discussion forum/user group on the 0.58 -1.48 1.48 
nternet (eg. a wiki or blog) 
T2GO7C Produce presentations (e.g. [Power Point® or a similar -0.15 -0.22 0.22 
program]), with simple animation functions 
T2G07D Use the Internet for online purchases and payments -0.46 -0.36 0.36 
T2GO7E Prepare lessons that involve the use of ICT by students -0.52 -0.68 0.68 
T2GO7F Using a spreadsheet program (e.g. [Microsoft Excel ®]) 0.51 -0.79 0.79 
for keeping records or analyzing data 
T2G07G Assess student learning -0.27 -0.94 0.94 
T2GO07H Collaborate with others using shared resources 0.80 -1.24 1.24 
such as [Google Docs®], [Padlet] 
T2G07 Use a learning management system (e.g. [Moodle], 0.69 -1.27 1.27 
[Blackboard], [Edmodo]) 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 191 


Teachers’ emphasis on developing ICT skills and coding skills 
Teachers were asked in Question 9 to indicate the emphasis they had given to developing different 
|CT-based capabilities to students in the reference class (selecting from “strong emphasis,’ “some 
emphasis,’ “little emphasis,’ or “no emphasis’). The nine items in the question were used to derive 
ascale of teachers’ emphasis on developing ICT capabilities in class (T_ICTEMP). 


Similarly, Question 13 required teachers to indicate the emphasis they had given to teaching 
different skills related to coding in the reference class (using the same response options as in 
Question 9). The nine items in this question were used to derive a scale of teacher emphasis of 
teaching CT-related tasks (T_CODEMP). Higher scores on either scale corresponds to greater 
emphasis on developing |CT-based capabilities and coding skills. 


Figure 12.12: Confirmatory factor analysis of items measuring teachers’ emphasis on learning of [CT and 
coding tasks 


T_ICTEMP 82 


rs 


71 


78 
is) 


85 
82 


81 


81 
84 


—<— 
a 
— 
i = 
60 a q09i 
a 
—_ 
ae 
— 
ON 


Model fit indices: Pooled sample Multiple-group models 


Configural Metric Scalar 
RMSEA 0.066 0.063 N/A 0.075 
CFI 0.96 0.96 N/A 0.93 
TL 0.95 0.96 N/A 0.94 


192 ICILS 2018 TECHNICAL REPORT 


Figure 12.12 illustrates the results of the confirmatory factor analysis assuming a two-dimensional 
model with items from the two scales. The model fit was highly satisfactory. When reviewing 
measurement invariance using multiple-group models with different constraints, the model fit 
changed only marginally which indicates a relatively high degree of invariance for this model. There 
was ahigh correlation between the two latent factors in the model (0.60). Both scales were highly 
reliable (see Table 12.21). The average reliability (Cronbach's alpha) across countries for both was 
0.90 (ranging from 0.86 to 0.94 for T_ICTEMP, and between 0.88 and 0.93 for T CODEMP). The 
item parameters for both scales are presented in Table 12.22 


Table 12.21: Reliabilities for scales measuring teachers’ emphasis on learning of ICT and coding tasks 


Country Scale reliability (Cronbach’s alpha) 
T_ICTEMP T_CODEMP 
Chile 0.90 0.92 
Denmark 0.87 0.88 
Finland 0.91 0.88 
France 0.91 0.89 
Germany 0.90 0.89 
taly 0.90 0.90 
azakhstan 0.89 0.91 
orea, Republic of 0.92 0.93 
Luxembourg 0.93 0.92 
Moscow (Russian Federation) 0.86 0.91 
North Rhine-Westphalia (Germany) 0.90 0.89 
Portugal 0.92 0.91 
United States 0.94 0.92 
Uruguay 0.88 0.91 
ICILS 2018 average 0.90 0.90 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 193 
Table 12.22: Item parameters for scales measuring teachers’ emphasis on learning of ICT and coding tasks 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
T_ICTEMP In your teaching the reference class in this school year, how much emphasis have you given to developing the 
following ICT-based capabilities in your students? 
T2GO9A To access information efficiently -0.84 -2.12 -0.49 2.61 
T2G09B To display information for a given audience/purpose -0.37 -2.00 -0.47 2.46 
T2GO09C To evaluate the credibility of digital information -0.20 -1.95 -0.25 2:20 
T2G09D To share digital information with others 0.09 -2.02 -0.33 239 
T2GO9E To use computer software to construct digital work -0.29 -1.61 -0.30 1.92 
products (e.g. presentations, documents, images and 
diagrams) 
T2GO9F To provide digital feedback on the work of others 1:23 -1.87 -0.22 2.09 
(such as classmates) 
T2G09G To explore a range of digital resources when searching -0.19 -1.87 -0.35 222 
for informatio 
T2GO9H To provide references for digital information sources 0.31 -1.70 -0.34 2.04 
T2G09| To understand the consequences of making information 0.26 -1.57 -0.19 1.76 
publically available online 
T_CODEMP In your teaching of the reference class this school year, how much emphasis have you given to teaching the 
following skills? 
T2G13A To display information in different ways -1.26 -1.82 -0.70 252 
T2G13B To break a complex process into smaller parts -0.74 -1.70 -0.61 2.30 
T2G13C To understand diagrams that describe or show real- -0.06 -1.48 -0.51 2.00 
world problems 
T2G13D To plan tasks by setting out the steps needed to -0.75 -1.72 -0.48 2.20 
complete them 
[2G13E To use tools making diagrams that help solve problems 0.71 -1.48 -0.41 1.89 
T2G13F To use simulations to help understand or solve real- 0.72 -1.4 -0.43 1.84 
world problems 
T2G13G To make flow diagrams to show the different parts of 1.31 -1.38 -0.39 1.77 
a process 
2G13H To record and evaluate data to understand and solve 0.11 -1.46 -0.54 1.99 
a problem 
T2613) To use real-world data to review and revise solutions -0.04 -1.43 -0.54 1.97 


to problems 


194 


Teachers’ use of ICT for class activities 


Question 10 of the ICILS 2018 teacher questionnaire asked t 


ICILS 2018 TECHNICAL REPORT 


eachers to select how often students 


in their reference class used ICT for different activities (they could choose from “they do not 


Dag ne 


engage in this activity,’ “they never use ICT in this activity,’ “th 
“they often use ICT in this activity,’ or “they always use ICT 


ey sometimes use ICT in this activity,’ 
in this activity”). The 14 items in the 


question were used to derive a scale of teachers' use of ICT for classroom activities (T_CLASACT), 


where responses from teachers who indicated that their s 


tudents did not engage in an activity 


were treated as missing values. Higher scale scores corresponded to more frequent use of ICT 


for these activities. 


Figure 12.13 illustrates the results of the confirmatory factor 


alpha) of the scale was 0.94 (ranging from 0.92 to 0.96) (see 


Figure 12.13: Confirmatory factor analysis of items measuring teac 


item parameters for the scale that was used to derive the IRT 


analysis assuming a one-dimensional 


model with items from the scale. The analysis showed only marginally satisfactory model fit once 
residual variance for two items with similar content was taken into account (for items aand b). The 
model was equally marginally satisfactory for multiple-group models with different constraints, 
which suggests a relative high level of measurement invariance. The average reliability (Cronbach's 


Table 12.23). Table 12.24 shows the 
scale scores. 


hers’ use of ICT for class activities 


0, 
q10a x 
AY 
q10b = 
76 . q10c 
76 
ae q10d 
76 a 
76 pee ss qi0e 
78 ee q10F 
72 
60 ———_____» qi0h 
83 ce 
ne q10i 
85 iia q10j 
84 oe 
84 ca q10k 
8A q10l 
<= qi0m 
qi0n 
Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.091 0.097 0.095 0.095 
CFI 0.96 0.96 0.95 0.94 
TLI 0.96 0.95 0.95 0.95 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 195 
Table 12.23: Reliabilities for scale measuring teachers’ use of ICT for class activities 
Country Scale reliability (Cronbach’s alpha) 
T_CLASACT 
Chile 0.94 
Denmark 0.92 
Finland 0.92 
France 0.94 
Germany 0.94 
taly 0.92 
azakhstan 0.94 
orea, Republic of 0.95 
Luxembourg 0.94 
Moscow (Russian Federation) 0.92 
North Rhine-Westphalia (Germany) 0.92 
Portugal 0.96 
United States 0.95 
Uruguay 0.95 
ICILS 2018 average OE 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.24: Item parameters for scale measuring teachers’ use of ICT for class activities 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
TEGEASAGII How often do students in your reference class use ICT for the following activities? 
T2G10A Work on extended projects (i.e. lasting over a week) -0.73 -2.73 0.86 1.87 
T2G10B Work on short assignments (i.e. within one week) -0.67 -2.91 0.56 2.35 
T2G10C Explain and discuss ideas with other students 0.73 -2.54 0.31 2.23 
T2G10D Submit completed work for assessment -0.32 -2.28 0.49 1.79 
T2G10E Work individually on learning materials at their own -0.07 -2.67 0.42 225 
pace 
T2G10F Undertake open-ended investigations or field work 0.03 -2.69 0.59 2.10 
T2G10G Reflect on their learning experiences (e.g. by using 0.99 -1.92 0.36 57 
a learning log) 
T2G10H Communicate with students in other schools on projects 0.63 -2.19 0.27 92 
T2G10 Plan a sequence of learning activities for themselves 0.97 -2.02 0.32 70 
T2G10J Analyze data 0.22 -2.48 0.54 94 
T2G10 Evaluate information resulting from a search 0.11 -249 0.53 97 
T2G10L Collect data for a project -0.90 -2.70 0.67 2.02 
T2G10 Create visual products or videos -0.76 -2.53 0.74 1.79 
T2G10 Share products with other students -0.22 -2.32 0.50 1.82 


196 


ICILS 2018 TECHNICAL REPORT 


Teachers’ use of ICT for teaching practices 

In Question 11, respondents were asked to indicate how often they use ICT for 10 different 
practices related to teaching of the reference class (response options were “| do not use this 
practice with the reference class,” “| never use ICT with this practice” “| sometimes use ICT with 
this practice. “| often use ICT with this practice.’ and “I always use ICT with this practice’). Eight 
of the 10 items were used to derive the scale teachers' use of ICT for teaching practices in class 
(T_ICTPRAC) where responses from teachers who indicated that they did not use this practice 
with the reference class were treated as missing values. Higher scale scores corresponded to more 


frequent use of ICT with the different practices. 


Figure 12.14 illustrates the results of the confirmatory factor analysis assuming a one-dimensional 
model with items from the scale. The model fit was satisfactory for the pooled data set. When 
reviewing measurement invariance using multiple-group models with different constraints, the 
model fit changed only marginally which indicates a relatively high degree of invariance for this 
model. The average reliability (Cronbach's alpha) of the scale across countries was high (0.90), 
ranging from 0.86 to 0.93 (see Table 12.25). The item parameters used to derive the IRT scale 
scores are presented in Table 12.26. 


Figure 12.14: Confirmatory factor analysis of items measuring teachers’ use of ICT for teaching practices 


BT. 


78 


719 


84 
84 


85 


78 


78 


//iN\ 


Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.063 0.074 0.076 0.080 
CFI 0.99 0.99 0.98 0.97 
TLI 0.99 0.98 0.98 0.98 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 197 
Table 12.25: Reliabilities for scale measuring teachers’ use of ICT for teaching practices 
Country Scale reliability (Cronbach's alpha) 
T_ICTPRAC 
Chile 0.92 
Denmark 0.88 
Finland 0.86 
France 0.89 
Germany 0.88 
taly 0.87 
azakhstan 0.92 
orea, Republic of 0.92 
Luxembourg 0.89 
Moscow (Russian Federation) 0.90 
North Rhine-Westphalia (Germany) 0.86 
Portugal 0.91 
United States 0.91 
Uruguay 0.93 
ICILS 2018 average 0.89 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.26: Item parameters for scale measuring teachers’ use of ICT for teaching practices 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
T_ICTPRAC How often do you use ICT in the following practices when teaching your reference class? 
T2G11B The provision of remedial or enrichment support to -0.20 -2.41 0.26 2.15 
individual students or small groups of students 
T2G11C The support of student-led whole-class discussions -0.34 -2.34 0.21 2.12 
and presentations 
T2G11D The assessment of students’ learning through tests -0.10 -1.72 0.20 1.52 
T2G11E The provision of feedback to students on their work 0.21 -2.07 0.27 1.80 
T2G11F The reinforcement of learning of skills through -0.28 -2.37 0.25 2:12 
repetition of examples 
T2G11G The support of collaboration among students 0.25 -2.28 0.23 2.05 
T2G11H The mediation of communication between students 0.77 -1.78 0.05 1.73 
and experts or external mentors 
T2G11J [he support of inquiry learning -0.31 -2.35 0.37 1.98 


198 


Teachers’ use of 


Question 12 of th 
CT-related tools 
to select either “ 
Two scales were 


frequent use of t 


Figure 12.15 illus 


ICT tools in class 


noe 


never: 


model with items 
- 


RT scale scores. 


Figure 12.15: Confirmatory factor analysis of items meas 


from 0.71 to 0.91 for the former, and ranging from 
Table 12.28 shows the item parameters for each o 


in some lessons, 


ue 


).T 


trates the results of the confirmatory factor ana 
fromthe twoscales. The model fit was satisfactory for the pooled dataset, however, 
th increasing constraints across different multiple-group mode 
finding which suggests acertain degree of variation in measurement characteristics. The correlation 
between the two latent factors was quite high (0.74 

of the two scales across countries were 0.82 for T_US 


ETOOLa 


ICILS 2018 TECHNICAL REPORT 


eICILS 2018 teacher questionnaire asked teachers how often they used different 
inthe teaching of their reference classes. For each tool, respondents were asked 
in most lessons,’ or “in every or almost every lesson.” 
derived from the set of items: teachers’ use of digital learning tools (T_USETOOL) 
and teachers' use of general utility software (T_USEUTIL). Higher scores on either scale reflects more 
hese types of tools. 


ysis assuming a two-dimensional 
Is the fit became unsatisfactory, a 


he average reliabilities (Cronbach’s alpha) 
nd 0.73 for T_USEUTIL (ranging 
0.65 to 0.82 for the latter) (see Table 12.27). 
f the two scales that were used to derive the 


ring teachers’ use of ICT tools in class 


74 


eUSEO TE 


69 
vs) 
76 
83 
62 
76 
74 
78 
84 
69 


2a 


2| 


[HN AZIM 


2p 


Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.063 0.087 0.109 0.105 
CFI 0.96 0.94 0.88 0.87 
TLI 0.95 0.92 0.88 0.89 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 199 
Table 12.27: Reliabilities for scales measuring teachers’ use of ICT tools in class 
Country Scale reliability (Cronbach's alpha) 
T_USETOOL T_USEUTIL 
Chile 0.87 0.81 
Denmark 0.71 0.66 
Finland 0.75 0.68 
France 0.79 0.65 
Germany 0.83 0.71 
taly 0.79 0.78 
azakhstan 0.91 0.80 
orea, Republic of 0.90 0.78 
Luxembourg 0.78 0.70 
Moscow (Russian Federation) 0.89 0.82 
North Rhine-Westphalia (Germany) 0.79 0.74 
Portugal 0.80 0.76 
United States 0.86 0.72 
Uruguay 0.86 0.78 
ICILS 2018 average 0.82 0.74 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.28: Item parameters for scales measuring teachers’ use of ICT use in class 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
InUSETOOE How often did you use the following tools in your teaching of the reference class this school year? 
T2G12A Practice programs or apps where you ask students 0.02 -1.87 0.61 1.27 
questions (e.g. [Quizlet, Kahoot], [mathfessor]) 
T2G12B Digital learning games -0.16 -2.08 0.69 1.38 
T2G12G Concept mapping software (e.g. [Inspiration ®], 0.54 -1.18 0.26 0.93 
Webspiration ®)]) 
T2G12H Simulations and modelling software (e.g. [NetLogo]) 0.88 -0.72 -0.02 0.74 
T2G12 A learning management system (e.g. [Edmodo], -1.03 -0.43 0.24 0.20 
Blackboard]) 
T2G12 Collaborative software (e.g. [Google Docs ®], -0.37 -1.25 0.29 0.96 
Onenote]) [Padlet]) 
T2G12 nteractive digital learning resources -0.64 -1.35 0.24 kd: 
(e.g. learning objects) 
T2612 Graphing or drawing software 0.26 “1.15 0.18 O:97 
T2G120 e-portfolios (e.g. [VoiceThread]) 0.58 -0.57 -0.09 0.66 
T2G12Q Social media (e.g. [Facebook, Twitter]) -0.08 -1.02 0.34 0.68 
T_USEUTIL How often did you use the following tools in your teaching of the reference class this school year? 
T2G12C Word-processor software (e.g. [Microsoft Word ®]) -0.24 -1.93 0.52 1.41 
2G12D Presentation software (e.g. [Microsoft PowerPoint ®]) -0.25 -2.05 0.55 1.50 
T2G12L Computer-based information resources (e.g. topic- 0.10 -2.25 O53 1.72 
related websites, wikis, encyclopaedia) 
T2G12P Digital contents linked with textbooks 0.39 -1.37 0.39 0.98 


200 


ICILS 2018 TECHNICAL REPORT 


Teachers’ perceptions of ICT resources and teacher collaboration 


Question 14 of the ICILS 2018 teacher questionnaire asked teachers to provide their level of 


agreement or disagreement (“strongly agree,’ “agree,” “disagree: 


nu Dar 


nu 


strongly disagree”) to a series 


of statements about ICT resources at their schools. Question 15 requested respondents to rate 


«a 


nu yd 


their agreement or disagreement (“strongly agree,” “agree,” “disagree; 


tems included 


used for measu 


Figure 12.16il 


dataset after a 


T_COLICT). Hi 


statements about |CT-related teacher collaboration a 
in these two questions: teachers’ percep 
at school (T_RESRC) and teachers’ perceptio 


gher scores oneither scaler 
rement. 


ustrates the results of the co 


owing residuals for two item 


the scalar mod 


el, which suggests a certain 


The correlation between the two latent f 


reliabilities (Cronbach's alpha) of the two sc 


for T.COLICT 
latter) 


ns of the co 
eflects higher 


nfirmatory fac 


stobecorrela 


tor analysis ass 


model with items from the two scales. The model fit was marginally sa 


ted (item aand 


( 


uming a two-dimensio 
tisfactory for the poo 


different multiple-group models the fit remained marginally satisfactory with least good fit 
degree of variation in meas 


urement characteristi 


strongly disagree’) with 
t school. Two scales were derived from the 
tions of the availability of computer resources 
laboration between teachers when using ICT 
evels of agreement with the statements 


ed 


bin Question 14). Across 


for 
cs. 


actors was moderately positive (0.42). The average 


ales across countries were 0.87 for T.RESRC and 0. 


(ranging from 0.83 to 0.93 


see Table 12.29). Table 12.30 shows t 


were used to derive the IRT scale scores. 


for the former, and ranging from 0.77 to 0.92 for 


he item parameters for each of the two scales t 


85 
the 
hat 


Figure 12.16: Confirmatory factor analysis of items measuring teachers’ perceptions of ICT resources and 
teacher collaboration at school 


qi4b X‘ 
ae 51 
76 ee qi4c a 
78 se a 
87 
T_RESRC 78 > qi4e 
i 4f 
80 4 
79 aa qi4g 
A2 qi4h 
eee qi5a 
90 
92 et ee 
TIGOMKET 75 > qi5c 
83 ce eee 
76 ened 
FPR Ee: ae 
Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.086 0.084 0.081 0.092 
CFI 0.96 0.97 0.97 0.95 
TL 0.95 0.97 0.97 0.96 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 201 
Table 12.29: Reliabilities for scales measuring teachers’ perceptions of |CT resources and teacher collaboration 
at school 
Country Scale reliability (Cronbach’s alpha) 
T_RESRC mCOhieh 
Chile 0.89 0.90 
Denmark 0.84 0.83 
Finland 0.83 0.86 
France 0.85 0.86 
Germany 0.89 0.77 
taly 0.87 0.89 
azakhstan 0.93 0.85 
orea, Republic of 0.90 0.89 
Luxembourg 0.84 0.85 
Moscow (Russian Federation) 0.88 0.88 
North Rhine-Westphalia (Germany) 0.87 0.81 
Portugal 0.86 0.86 
United States 0.90 0.92 
Uruguay 0.88 0.89 
ICILS 2018 average 0.87 0.85 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.30: Item parameters for scales measuring teachers’ perceptions of ICT resources and teacher 
collaboration at school 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
TLIRESRE To what extent do you agree or disagree with the following statements about using ICT in teaching at your school? 
T2G14B y school has sufficient ICT equipment (e.g. computers). -0.42 -2.15 -0.12 2.26 
T2G14C The computer equipment in our school is up-to-date. -0.30 -2.29 -0.22 2541 
T2G14D y school has access to sufficient digital learning -0.21 -2.50 -0.18 2.68 
resources (e.g. learning software or [apps]). 
T2G14E y school has good connectivity (e.g. fast spped - -0.06 -2.12 -0.30 2.42 
same as in STable) to the Internet. 
T2G14F There is enough time to prepare lessons that 0.69 -2.72 -0.05 2.77 
incorporate ICT. 
T2G14G There is sufficient opportunity for me to develop 0.241 -2.82 -0.23 3.05 
expertise in ICT. 
T2G14H There is sufficient technical support to maintain ICT 0.09 -2.44 -0.31 2.76 
resources. 
(EOL Grp To what extent do you agree or disagree with the following statements about your use of ICT in teaching and learning 
at your school? 
T2G15A work together with other teachers on improving the 0.18 -3.62 -0.35 3.97 
use of ICT in classroom teaching. 
T2G15B collaborate with colleagues to develop ICT-based 0.34 -3.87 -0.23 4.10 
essons. 
T2G15C observe how other teachers use ICT in teaching. 0.01 -3.57 -0.69 4.26 
T2G15D discuss with other teachers how to use ICT in. -0.24 -3.67 -0.80 4.47 
teaching topics 
T2G15E share |CT-based resources with other teachers in -0.29 -3.40 -0.62 4.03 
my school. 


202 


ICILS 2018 TECHNICAL REPORT 


Teachers’ reports on ICT-related professional learning 


Question 17 of the ICILS 2018 teacher questionnaire asked teachers to indicate their participation 


(“not at all “once only’ “m 
We derived two scales from the 


ore than once’) in different |CT-related professio 
set of items: teacher participation 


nal learning activities. 


in structured learning professional 


development related to ICT (T_PROFSTR) and teacher participation in reciprocal learning professional 


development related to ICT (T_P 
partici 


Figure 12.17 illustrates th 


mode 


however, with more con 


ROFR 


pation in these two types of professional learni 


eresu 


with items from the two scales. The model fi 


strain 


ts across different m 


ts of the confirmatory 


marginally satisfac 


characteristics. The correlation between the two laten 
reliabilities (Cronbach's al 


0.69 for T.PROFR 


for the latter) (see Table 12.27 


that were used tod 


EC (ranging 


EC). Higher scores on either scale refl 


ng activities. 


t was satisfactory fo 


tory, a finding which suggests a certain degree of variati 
t factors was quite high (0.69). The average 
pha) of the two scales across countries were 0.76 
from 0.63 to 0.87 for the former, and ranging from 0.58 to 0.79 
. Table 12.28 shows the item parameters for each of the two scales 
erive the IRT scale scores. 


factor analysis assumi 


ects higher levels of 


ng atwo-dimensional 
r the pooled dataset, 


ultiple-group models the fit became only 


on in measurement 


for T PROFSTR and 


Figure 12.17: Confirmatory factor analysis of items measuring teachers’ participation in |CT-related 


professional learning 


T_PROFSTR 


T _PROFREC , 


ue qi7a 
74 
an a qi7b 
81 
74 > qi7c 
87 
ee qi7h 
91 
ie ee ae 
ee ad q17d 
66 
ie qi7e 
78 
a qi7f 
a ae 
qi7g 


Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.074 0.076 0.095 0.096 
CFI 0.96 0.97 0.94 0.92 
TL 0.95 0.96 0.93 0.93 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 


Table 12.27: Reliabilities for scales measuring teachers’ participation in |CT-related professional learning 


Country Scale reliability (Cronbach’s alpha) 
T_PROFSTR T_PROFREC 
Chile 0.84 0.74 
Denmark 0.75 0.64 
Finland 0.63 0.67 
France 0.72 0.69 
Germany 0.72 0.62 
taly 0.79 0.68 
azakhstan 0.87 0.79 
orea, Republic of 0.83 0.78 
Luxembourg 0.73 0.73 
Moscow (Russian Federation) 0.80 0.76 
North Rhine-Westphalia (Germany) 0.64 0.58 
Portugal 0.75 0.65 
United States 0.82 O77 
Uruguay 0.80 0.74 
ICILS 2018 average 0.76 0.69 


Notes: Benchmarking participants in italics. T 
excluding benchmarking participants. 


eICILS 2018 average is based on data from the participating countries, 


Table 12.28: Item parameters for scales measuring teachers’ participation in |CT-related professional learning 


203 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 

T_PROFSTR How often have you participated in any of the following professional learning activities in the past two years? 

T2G17A A course on ICT applications (e.g. word processing, -0.62 -0.52 0.52 
presentations, internet use, spreadsheets, databases) 

T2G17B A course or webinar on integrating ICT into teaching -0.34 -0.55 0.55 
and learning 

T2G17C Training on subject-specific digital teaching and -0.45 -0.70 0.70 
learning resources 

T2G17H A course on use of ICT for [students with special needs 0.84 -0.24 0.24 
or specific learning difficulties] 

T2G17 A course on how to use ICT to support personalized 0.58 -0.31 0.31 
earning by students 

T_PROFREC How often did you use the following tools in your teaching of the reference class this school year? 

T2G17D Observations of other teachers using ICT in teaching -0.40 0.10 -0.10 

T2G17E An |CT-mediated discussion or forum on teaching and 0.39 0.19 -0.19 
earning 

T2G17F The sharing of digital teaching and learning resources -0.31 0.08 -0.08 
with others through a collaborative workspace 

T2G17G Use of a collaborative workspace to jointly evaluate 0.33 0.31 -0.31 
student work 


204 


Teachers’ perceptions of positive and negative outco 
learning 


ICILS 2018 TECHNICAL REPORT 


mes of using ICT for teaching and 


Question 18 of the ICILS 2018 teacher questionnaire asked teachers to provide their level of 


nu » 


agreement or disagreement (“strongly agree,’ “agree; 


« nu 


disagree,” “strongly disagree”) to a series 


of statements about positive and negative outcomes of using ICT for teaching and learning. Two 
scales were derived from the set of items: teachers' perceptions of positive outcomes when using ICT 
in teaching and learning (T_VWPOS) and teachers' perceptions of negative outcomes when using ICT in 


teaching and learning (T_VVWNEG). Higher scores on eith 
for each of these scales. 


Figure 12.18 illustrates the results of the confirmatory 


er scale reflects higher levels of agreement 


factor analysis assuming a two-dimensional 


model with items fromthe two scales. The model fit was satisfactory for the pooled dataset, however, 


with increasing constraints across different multiple-g 


roup models the fit became unsatisfactory, 


a finding which suggests a certain degree of variation in measurement characteristics. There was 
a moderately high negative correlation between the two latent factors (-O0.40). 


Figure 12.18: Confirmatory factor analysis of items measurin 
outcomes of using ICT for teaching and learning 


g teachers’ perceptions of positive and negative 


82 


84 
74 


TVWPOS 


Ae 
79 


Pe ae 
— 
> 
> 


81 
57 qisi 


(_ re F ae 
es qi8a 
AG qi8d 
Ab 
72 ————— qi8f 
85 
i ql8g 
90 eee 
70 oe qi8h 
q18i 
Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.063 0.094 N/A 0.092 
CFI 0.96 0.94 N/A 0.91 
TLI 0.95 0.93 N/A 0.93 


0.76 to 0.85 for the latter) (see Table 12.33). Table 12. 
the two scales that were used to derive the IRT scale s 


The average reliabilities (Cronbach's alpha) of the two scales across countries were 0.83 for 
T_VWPOS and 0.80 for T.VWNEG (ranging from 0.79 to 0.87 for the former, and ranging from 


34 shows the item parameters for each of 
cores. 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 


Table 12.33: Reliabilities for scales measuring teachers’ perceptions of positive and negative outcomes of 
using ICT for teaching and learning 

Country Scale reliability (Cronbach's alpha) 

T._VWPOS T.VWNEG 

Chile 0.86 0.81 

Denmark 0.81 0.77 

Finland 0.79 0.76 

France 0.82 0.80 

Germany 0.82 0.80 

taly 0.86 0.85 

azakhstan 0.87 0.77 
orea, Republic of 0.83 0.82 

Luxembourg 0.80 0.80 

Moscow (Russian Federation) 0.84 0.82 

North Rhine-Westphalia (Germany) 0.83 0.81 

Portugal 0.84 0.82 

United States 0.87 0.81 

Uruguay 0.87 0.83 

ICILS 2018 average 0.83 0.80 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.34: Item parameters for scales measuring teachers’ perceptions of positive and negative outcomes 
of using ICT for teaching and learning 


205 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
T_VWPOS To what extent do you agree or disagree with the following practices and principles in relation to the use of ICT in 
teaching and learning? 
T2G18B elps students develop greater interest in learning. -0.67 -2.84 -1.01 3.85 
T2G18C elps students to work at a level appropriate to their -0.42 -3.51 -0.66 4.17 
earning needs. 
T2G18E elps students develop problem solving skills. 0.29 -3.66 -0.63 4.29 
T2G185 Enables students to collaborate more effectively. 0.27 -3.93 -0.49 4.42 
T2G18 elps students develop skills in planning and self- 0.61 -3.88 -0.41 4.29 
regulation of their work. 
T2G18L mproves academic performance of students. 0.73 -3.89 -0.41 4.30 
T2618 Enables students to access better sources of information. -0.82 -2.87 -1.06 393 
T_VWNEG To what extent do you agree or disagree with the following practices and principles in relation to the use of ICT in 
teaching and learning? 
T2G18A mpedes concept formation by students. 0.93 -2.64 0.82 1.82 
2G18D Results in students copying material from Internet -0.99 -3.05 -0.02 3.07 
sources. 
T2G18F Distracts students from learning. 0.33 -3.26 0.55 271 
T2G18G Results in poorer written expression among students. -0.25 -2.93 0.29 2.63 
2G18H Results in poorer calculation and estimation skills 0.09 -3.28 0.57 2.71 
among students. 
T2G18 Limits the amount of personal communication -0.10 -3.04 0.50 205) 


among students. 


206 


School questionnaires 


School principals’ use of ICT 


ICILS 2018 TECHNICAL REPORT 


The school principal questionnaire asked respondents to indicate how often they used ICT for 


different school related activiti 


es (for each item they could select from “never,” “less than once 


a month,” “at least once a month but not every week,’ “at least once a week but not every day,’ 
and “every day”). Nine of the items were used to derive the scale principals’ use of ICT for general 


school-related activities (P_ICTU 
scale principals’ use of ICT for sc 


hool-related commun 


on these two scales represent more frequent use of ICT. 


Aconfirmatory factor analysis a 
(see Figure 12.19) was satisfac 


ssuming a two-dime 


SE) and four of the remaining five items were used to derive the 
ication activities (P_ICTCOM). Higher scores 


nsional model with items from the two scales 
tory for the pooled ICILS 2018 sample. The two latent factors in 


the model were strongly correlated (0.70). Average reliabilities across countries for the two scales 


were satisfactory in most countries (see Table 12.35 
(ranging from 0.68 to 0.87) and P_LICTCOM had an average reli 
to 0.79). Table 12.36 records the IRT parameters for each of th 


Figure 12.19: Confirmatory factor analysis of items measuring school principals’ 


.P_ICTUSE had an average reliability of 0.79 
ability of 0.70 (ranging from 0.47 
e two scales. 


se of ICT 


Paes 


qO2b 


qO2c 


qO2d 


qO2e 


q02j 


qO2k 


qo2l 


qO2m 


qO2n 


qO2f 


q02g 


qO2h 


q02i 


Model fit indices: Pooled sample 
RMSEA 0.058 
CFI 0.92 
TLI 0.90 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 207 


Table 12.35: Reliabilities for scales measuring principals’ use of ICT 


Country Scale reliability (Cronbach's alpha) 
P_ICTUSE P_ICTCOM 
Chile 0.85 0.74 
Denmark 0.77 0.54 
Finland 0.73 0.78 
France 0.73 0.76 
Germany 0.81 0.67 
taly 0.81 0.66 
azakhstan 0.80 0.81 
orea, Republic of 0.87 0.82 
Luxembourg 0.78 0.47 
Moscow (Russian Federation) 0.81 0.79 
North Rhine-Westphalia (Germany) 0.68 0.62 
Portugal 0.76 0.71 
United States 0.83 0.83 
Uruguay 0.78 0.59 
ICILS 2018 average 0.78 0.70 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.36: Item parameters for scales measuring principals’ use of [CT 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
P_ICTUSE How often do you use ICT for the following activities? 
P2G02B Provide information about an educational issue -0.15 -0.57 -0.66 1.24 
hrough a website 
P2GO02C Look up records in a database (e.g. in a student -0.91 -0.01 -0.56 0.57 
information system) 
P2G02D aintain, organize and analyze data (e.g. with a -0.55 -0.37 -0.43 0.80 
spreadsheet or database) 
P2GO2E Prepare presentations 0.49 -1.31 -0.30 1.61 
P2G02J Work with a learning management system (e.g. [Moodle]) 0.56 0.13 -0.36 0.23 
P2G02 Use social media to communicate with the wider 0.45 0.37 -0.81 0.44 
community about school-related activities 
P2GO02L anagement of staff (e.g. scheduling, professional -0.48 -0.46 -0.21 0.67 
development) 
P2GO2 Preparing the curriculum 0.47 -0.34 -0.32 0.66 
P2G02 School financial management 0.11 -0.18 -0.35 0.53 
P_ICTCOM How often do you use ICT for the following activities? 
P2G02F Communicate with teachers in your school -1.15 -0.03 -0.78 0.81 
P2G02G Communicate with education authorities -0.08 -1.19 -0.31 1.50 
P2G02H Communicate with principals and senior staff in other 0.43 -1.14 -0.52 1.65 
schools 
P2G02 Communicate with parents 0.80 -0.92 -0.59 1.51 


208 


School principals’ views on using ICT 
In Question 9 of the ICILS 2018 principal questionnaire, respondents were given seven different 


|CT-related outcomes of education in 


their school, a 


ICILS 2018 TECHNICAL REPORT 


nd were asked to rate their perceived level of 


importance (selecting from “very important,’ “quite important,” “somewhat important, and “not 


important”). The first six of the seven i 


views on using ICT for educational outcomes (P_VWICT 
levels of importance assigned to ICT-related skills a 


Figure 12.20 presents the results o 


model had a highly satisfactory fit for the pooled ICI 


reliabilities (Cronbach's alpha) across 
from 0.75 to 0.96 across countries) 
to derive the final scale scores. 


Figure 12.20: Confirmatory factor analysis of items meas 


. Table 12.38 sh 


f the confirm 


countries, wh 


tems inthe question were used to derive the scale principals’ 


). Higher scale scores corresponded to higher 
s an outcome of learning. 


atory factor analysis. The one-dimensional 
LS 2018 dataset. Table 12.37 shows the scale 
ich were quite high (on average 0.87, ranging 
ows the IRT item parameters that were used 


ring school principals’ views on using ICT 


ee ald 
78 ee qO%b 


we _ qO9c 


84 ——-______y qo9d 
la qO%e 


ii oe 909 


Model fit indices: Pooled sample 
RMSEA 0.033 
CFI 1.00 
TLI 1.00 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 209 


Table 12.37: Reliabilities for scale measuring principals’ views on using [CT 


Country Scale reliability (Cronbach’s alpha) 
P_VWICT 
Chile 0.96 
Denmark 0.81 
Finland 0.84 
France 0.89 
Germany 0.84 
taly 0.86 
azakhstan 0.85 
orea, Republic of 0.89 
Luxembourg 0.82 
Moscow (Russian Federation) 0.75 
North Rhine-Westphalia (Germany) 0.87 
Portugal 0.85 
United States 0.93 
Uruguay 0.92 
ICILS 2018 average 0.85 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.38: Item parameters for scale measuring principals’ views of using |CT 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
P_VWICT How important is each of the following outcomes of education in your school? 
P2GO9A The development of students’ basic computer skills -0.06 -2.61 -0.58 3.19 
(e.g. internet use, email, word processing, presentation 
software) 
P2GO09B The development of students’ skills in using ICT for 0.39 -4.10 0.30 3.80 
collaboration with others 
P2GO9C The use of ICT for facilitating students’ responsibility 0.66 -3.83 0.21 3.62 
for their own learning 
P2GO09D The use of ICT to augment and improve students’ 0.07 -4.16 0.22 3.94 
learning 
P2GO9E The development of students’ understanding and skills -0.67 -3.54 -0.01 3.55 
relating to safe and appropriate use of ICT 
P2GO9F The development of students’ proficiency in accessing -0.39 -3.74 -0.07 3.81 
and using information with ICT 


210 


ICILS 2018 TECHNICAL REPORT 


School principals’ reports on expected ICT knowledge and skills of teachers 


In Question 11, school principals were asked whether teachers in their school were expected to 


acquire knowledge and skills in arrange of differen 
were asked to select either “expected and requ 
Data from eight items were used to derive th 


ired,’ “expected bu 
e scale principals’ 


t activities related to ICT. For each activity they 


t not required,’ or “not expected.” 
reports on expectations of ICT use 


by teachers (P_LEXPLRN) and data from the remaining three items provided the basis for the scale 


principals' reports on expectations for teacher co 
these two scales represent higher levels of ex 


Figure 12.21 provides the results of the cont 
the question. Here we can see that the two-dimensional mode 
taset. The results also showed a relative high positive correlation between the 
two latent factors (0.69). Table 12.39 displays the scale reliabi 
two scales. P_EXPLRN had an average reliability across countri 


for the pooled da 


0.89) whereas P_ 


pectations from th 


laboration using IC 


firm 


EXPTCH had an average reliability across cou 


T (P_EXPTCH). Higher scores on 
e principals. 


atory factor analysis of data from the items in 


had a marginally satisfactory fit 


ities (Cronbach's alpha) for the 
es of 0.78 (ranging from 0.66 to 


ntries of 0.70 (ranging from 0.59 


to 0.86). Table 12.40 shows the IRT item parameters that we used to derive the final scale scores 


for each of the tw 


o scales. 


Figure 12.21: Confirmatory factor analysis of items measuring principals’ reports on expected ICT knowledge 


and skills of teacher. 


S 


P_EXPLRN 


JS9 
P_EXPTCH 82 
93 


Model fit indices: Pooled sample 
RMSEA 0.083 

CFI 0.95 

TL 0.93 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 211 


Table 12.39: Reliabilities for scales measuring principals’ reports on expected ICT knowledge and skills 
of teachers 


Country Scale reliability (Cronbach’s alpha) 
P_EXPLRN P_EXPTCH 
Chile 0.89 0.78 
Denmark 0.73 0.60 
Finland 0.74 0.63 
France 0.74 0.76 
Germany 0.77 0.64 
taly 0.75 0.65 
azakhstan 0.78 0.73 
orea, Republic of 0.89 0.86 
Luxembourg 0.81 0.86 
Moscow (Russian Federation) 0.76 0.69 
North Rhine-Westphalia (Germany) 0.66 0.72 
Portugal 0.69 0.59 
United States 0.81 0.65 
Uruguay 0.81 0.65 
ICILS 2018 average 0.77 0.71 


Table 12.40: Item parameters for scales measuring principals’ reports on expected ICT knowledge and skills 
of teachers 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
P_EXPLRN Are teachers in your school expected to acquire knowledge and skills in each of the following activities? 
P2G11A ntegrate Web-based learning in their instructional -0.54 -2.19 2.19 
practice 
P2G11B Use ICT-based forms of student assessment 0.04 -1.46 1.46 
P2G11C Use ICT for monitoring student progress -0.01 -1.66 1.66 
P2G11G ntegrate ICT into teaching and learning -1.97 -2.66 2.66 
P2G11H Use subject-specific digital learning resources -0.33 -2.19 2A9 
e.g. tutorials, simulation) 
P2G11 Use e-portfolios for assessment 1.60 -1.38 1.38 
P2G11J Use ICT to develop authentic (real-life) assignments 0.94 -2.06 2.06 
for students 
P2G11 Assess students’ [computer and information literacy] 0.27 -1.59 1.59 
P_EXPTCH Are teachers in your school expected to acquire knowledge and skills in each of the following activities? 
P2G11D Collaborate with other teachers via ICT -0.49 -2.70 2.70 
P2G11E Communicate with parents via ICT 0.17 -1.77 1.77 
P2G11F Communicate with students via ICT 0.32 -2.32 2.32 
P2GO02 Communicate with parents 0.80 -0.92 -0.59 


212 


ICILS 2018 TECHNICAL REPORT 


School principals’ reports on priorities for ICT use at schools 


School principals were asked in Question 15 to indicate the pri 


facilitating the use of ICT in 
priority,’ “medium priority, ‘ 
n 


the questio 
hardware (P_PRIORH). The d 
cipals’ reports on priorities 
e reflect higher levels of 


prin 


SCa 


( 


perceived priority. 


teaching and learning. For each item 
ow priority,’ or “not a priority.’ The data from t 
were used to derive the scale principals’ reports on priorities for facilitating use of ICT - 
ata from the remaining seven items were used 
for facilitating use of ICT - support (P_ 


Figure 12.22 depicts the results from the confirmatory factor analysis of 


the 
cor 
for 
cou 


cou 


to scale these items. 


ority given to different ways of 
they were asked to select “high 


he first three items in 


to derive the scale of 


PRIORS). Higher values on each 


the scaled items from 
question. There was a satisfactory fit for the two-factor model, and we 


found a high positive 


relation between the two latent factors (0.61). The reliabilities (Cronbac 
each country are presented in Table 12.41. On average, the reliability of PLPRIORH across 
ntries was 0.79 (ranging from 0.43 to 0.90), while the average reliability of P.PRIORS across 


h’s alpha) of the scales 


ntries was 0.84 (ranging from 0.77 to 0.92). Table 12.42 records the IRT item parameters used 


Figure 12.22: Confirmatory factor analysis of items measuring school principals’ reports on priorities for ICT 
use at schools 


P_PRIORH 


P_PRIORS E / 


61 


Model fit indices: 


Pooled sample 


RMSEA 0.071 
CFI 0.96 
TLI 0.95 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 213 


Table 12.41: Reliabilities for scales measuring school principals’ reports on priorities for ICT use at schools 


Country Scale reliability (Cronbach's alpha) 
P_PRIORH P_PRIORS 
Chile 0.89 0.90 
Denmark 0.72 0.77 
Finland 0.73 0.79 
France 0.79 0.82 
Germany 0.68 0.86 
taly 0.76 0.83 
azakhstan 0.90 0.86 
orea, Republic of 0.88 0.92 
Luxembourg 0.88 0.82 
Moscow (Russian Federation) 0.83 0.82 
North Rhine-Westphalia (Germany) 0.43 0.85 
Portugal 0.78 0.82 
United States 0.80 0.84 
Uruguay 0.73 0.88 
ICILS 2018 average OWT, 0.84 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.42: Item parameters for scales measuring school principals’ reports on priorities for |CT use at schools 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
P_PRIORH At your school, what priority is given to the following ways of facilitating the use of ICT in teaching and learning? 
P2G15A creasing the numbers of computers per student in 0.09 -1.40 -0.49 1.88 
e school 
P2G15B creasing the number of computers connected to 0.12 -0.82 -0.61 1.43 
e Internet 
P2G15C creasing the bandwidth of Internet access for the -0.21 -0.70 -0.69 1.39 
computers connected to the Internet 
P_PRIORS At your school, what priority is given to the following ways of facilitating the use of ICT in teaching and learning? 
P2G15D creasing the range of digital learning resources -0.99 -1.95 -0.36 2.32 
available for teaching and learning 
P2G15E Establishing or enhancing an online learning support 0.24 -1.62 -0.25 1.87 
platform 
P2G15F Supporting participation in professional development -0.61 -1.83 -0.18 2.01 
on pedagogical use of ICT 
P2G15G ncreasing the availability of qualified technical 0.11 -1.22 -0.15 1.37 
personnel to support the use of ICT 
P2G15H Providing teachers with incentives to integrate ICT use 0.26 -1.16 -0.34 154 
in their teaching 
P2G15l Providing more time for teachers to prepare lessons 1.43 -1.28 -0.18 1.46 


in which ICT is used 


P2G15J ncreasing the professional learning resources for -0.15 -1.70 -0.38 2.08 
eachers in the use of ICT 


214 


ICILS 2018 TECHNICAL REPORT 


Availability of digital resources at school 

Questions 4 and 5 of the school ICT coordinator questionnaire, asked respondents to list the 
availability of different technology and software resources in their school with the response options: 
“available to teachers and students,’ “available only to teachers,’ “available only to students,’ or “not 
available.” Thirteen items across the two questions were used to derive the scale ICT coordinators 
reports on availability of ICT resources at school (C_ICTRES). Higher scores correspond to greater 


availability of |CT-related resources. 


Figure 12.23 presents the results of the confirmatory factor analysis. The one-dimensional model 
had ahighly satisfactory fit. Table 12.43 shows the scale reliabilities (Cronbach's alpha) which had 
arelatively satisfactory average reliability across countries (0.74, ranging from 0.57 to 0.80). Table 
12.44 shows the IRT parameters that were used to derive the scale scores. 


Figure 12.23: Confirmatory factor analysis of items measuring schoo! ICT coordinators’ reports on the 
availability of digital resources at school 


q04b 


qO04c 


58 q04d 


39 


q04e 
56 


Al qO5a 


fil 
qO5b 
76 


CICTRES fF 74 qO5c 


52 
qO5f 
67 


58 q05g 


59 
qO5h 


65 


57 cle 


q05l 


MIT \\\S 


qO5m 


Model fit indices: Pooled sample 
RMSEA 0.048 
CFI 0.89 
TLI 0.87 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 


Table 12.43: Reliabilities for scale measuring school ICT coordinators’ reports on the availability of digital 
resources at school 


Country Scale reliability (Cronbach's alpha) 
C_ICTRES 
Chile 0.80 
Denmark 0.68 
Finland 0.71 
France 0.71 
Germany 0.79 
taly 0.78 
azakhstan 0.80 
orea, Republic of 0.79 
Luxembourg 0.62 
Moscow (Russian Federation) 0.78 
North Rhine-Westphalia (Germany) 0.57 
Portugal 0.74 
United States 0.70 
Uruguay 0.75 
ICILS 2018 average OF 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


Table 12.44: Item parameters for scale measuring ICT coordinators’ reports on the availability of digital 


215 


resources at school 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
GNGIRES Please indicate the availability of the following technology resources in your school. 
2G04B Digital learning resources that can only be used online -0.94 1.14 -1.14 
2G04C Access to the Internet through the school network -1.69 0.07 -0.07 
2G04D Access to an education site or network maintained by -0.29 0.52 -0.52 
education authorities 
2G04E Email accounts for school-related use -0.24 -0.49 0.49 
2GO5A Practice programs or [apps] where teachers decide 0.17 0.93 -0.93 
which questions are asked of students 
e.g. [Quizlet, Kahoot], [mathfessor]) 
2G05B Single user digital learning games (e.g. [languages online]) 0.35 1.48 -1.48 
2G05C ulti-user digital learning games with graphics and 1.16 1.68 -1.68 
inquiry tasks (e.g. [Quest Atlantis]) 
2GO5F Video and photo software for capture and editing -0.96 1.10 -1,10 
e.g. [Windows Movie Maker, iMovie, Adobe Photoshop]) 
2G05G Concept mapping software (e.g. [Inspiration ®], 0.57 1.39 -1.39 
Webspiration ®)) 
2GO05H Data logging and monitoring tools (e.g. [Logger Pro]) 1.47 1.09 -1.09 
hat capture real-world data digitally for analysis 
e.g. speed, temperature) 
2G055 A learning management system (e.g. [Edmodo], 0.00 1.21 -1.21 
Blackboard]) 
2G05L e-portfolios (e.g. [VoiceThread]) 0.83 1.25 -1.25 
2G05M Digital contents linked with textbooks -0.42 0.56 -0.56 


216 


ICILS 2018 TECHNICAL REPORT 


Hindrances to the use of ICT for teaching and learning at school 


In Question 13 of the ICILS 2018 ICT coordinator questionnaire, respondents were asked to 


indicate the extent that teachi 


ng and learning at their school is hindered 


related obstacles. For each of the 14 obstacles, they could rate their impact as “a 
extent,’ “very little,’ or “not at all.” Data from the fi 
the scale ICT coordinators reports on computer resource hindrances to the use of ICT in teaching and 


learning (C_HINR 


ES). 


Data from six of the remai 


rst six items in the quest 


ning eight items were u 


ionwere u 


by different resource- 


ot,” “to some 
sed to derive 


sed to derive the scale 


ICT coordinators reports on pedagogical resource hindrances to the use of ICT in teaching and learning 


(C_HINPED). Hig 


her scale scores corresponded 


the use of ICT for teaching and learning in schoo 


to greater perceived hin 
S. 


Figure 12.24 illustrates the results of the confirmatory factor analysis assu 
model with the scaled items. The model had asa 


and there was a strong correla 


drances o 


f obstacles to 


ming a two-dimensional 
tisfactory fit for the pooled ICILS 2018 dataset, 
tion between latent factors (0.64). 


Figure 12.24: Confirmatory factor analysis of items measuring ICT coordinators’ reports on hindrances to the 
use of ICT for teaching and learning at school 


74 a, 
69 a 
830 — 
Conn 82 ———____ 
[ 76 
64 
\ 61 
31. ——_ 
enn) 66 ———____» 
74 
ie 
OT 


qi3a 


ajdl:sie 


qi3d 


75 


qi3e 


qi3f 


qi3g 


Or 
ee ne 
ae 
Lee 


qi3h 


qi3i 


qi3j 


Model fit indices: 


Pooled sample 


RMSEA 0.060 
CFI 0.96 
TLI 0.95 


As evident in Table 12.45, the scale reliabilities for both scales were high. C_HINRES had an 
average reliability across countries of 0.81 (ranging from 0.69 to 0.88) and C_HINPED had an 
average reliability across countries of 0.79 (ranging from 0.68 to 0.91). Table 12.46 shows the 
item parameters for the scales that were used to derive the IRT scale scores. 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 


Table 12.45: Reliabilities for scales measuring ICT coordinators’ reports on hindrances to the use of ICT for 
teaching and learning at school 


Country Scale reliability (Cronbach’s alpha) 
C_HINRES C_HINPED 

Chile 0.85 0.83 
Denmark 0.80 0.81 
Finland 0.72 0.68 
France 0.81 0.80 
Germany 0.76 0.73 
taly 0.81 0.69 
azakhstan 0.88 0.86 
orea, Republic of 0.86 0.91 
Luxembourg 0.69 0.69 
Moscow (Russian Federation) 0.79 0.88 
North Rhine-Westphalia (Germany) 0.70 0.76 
Portugal 0.81 0.86 
United States 0.88 0.88 
Uruguay 0.84 0.77 
ICILS 2018 average OO 0.79 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based ond 


excluding benchmarking participants. 


ata from the participating countries, 


Table 12.46: Item parameters for scales measuring ICT coordinators’ reports on hindrances to the use of |CT 
for teaching and learning at school 


217 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
C_HINRES To what extent is the use of ICT in teaching and learning at your school hindered by each of the following obstacles? 
2G13A Too few computers with an Internet connection 0.76 -0.57 -0.51 1.08 
2G13B nsufficient Internet bandwidth or speed -0.26 -0.68 -0.18 0.85 
2G613C ot enough computers for instruction -0.10 -0.69 -0.40 1.08 
2G613D Lack of sufficiently powerful computers -0.23 -0.99 -0.08 1.07 
2G13E Problems in maintaining ICT equipment -0.16 -1.30 0.08 1.22 
2G13F ot enough computer software 0.00 -1.47 0.04 1.43 
C_HINPED To what extent is the use of ICT in teaching and learning at your school hindered by each of the following obstacles? 
2G613G nsufficient ICT skills among teachers -0.26 -1.97 -0.29 2.26 
2G13H nsufficient time for teachers to prepare lessons -0.24 -1.50 -0.26 1.77 
2613 Lack of effective professional learning resources for -0.10 -1.65 -0.18 1.83 
eachers 
26135 Lack of an effective online learning support platform 0.35 -1.46 0.02 1.44 
2613 Lack of incentives for teachers to integrate ICT use -0.03 -1.39 -0.25 1.64 
in their teaching 
2613 nsufficient pedagogical support for the use of ICT 0.28 -1.64 -0.22 1.86 


218 


ICILS 2018 TECHNICAL REPORT 


References 


Adams, R. J., Wu, M. L., & Wilson, M. R. (2015). ACER ConQuest: Generalised Item Response Modelling 
Software [computer software]. Version 4. Camberwell, Australia: Australian Council for Educational 
Research (ACER). 


Bentler, P. M., & Bonnet, D. C. (1980). Significance tests and goodness of fit in the analysis of covariance 
structures. Psychological Bulletin, 88(3), 588-606. 
Bollen, K. A., & Long, S. J. (Eds.). (1993). Testing structural equation models. Newbury Park, CA: Sage. 
Byrne, B. M. (2008). Testing for multigroup equivalence of a measuring instrument: A walk through the 
process. Psicothema, 20(4), 872-882. 
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. 
Fabrigar, L. R., Wegener, DT., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory 
actor analysis in psychological research. Psychological Methods, 4(3), 272-299. 


Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Duckworth, D. (2020). Preparing for life in a digital world. IEA 
International Computer and Information Literacy Study 2018 international report. Cham, Switzerland: Springer. 
https://www.springer.com/gp/book/9783030387808 

Ganzeboom, H.B. G., de Graaf, P.M., & Treiman, D. J. (1992). Astandard international socioeconomic index 
of occupational status. Social Science Research, 21, 1-56. 


orn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement equivalence in aging 
esearch. Experimental Aging Research, 18, 117-144. 


u, L. T., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional 
criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. 

nternational Labour Organization. (2007). International Standard Classification of Occupations: |SCO-2008. 
Geneva, Switzerland: International Labour Office. 


aplan, D. (2009). Structural equation modeling. Foundations and extensions. Los Angeles, London, New Delhi, 
& Singapore: SAGE. 

Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and 
theoretical issues. Multivariate Behavioural Research, 32(1), 53-76. 
acCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample 
size for covariance structure modeling. Psychological Methods, 1(2), 130-49. 
asters, G.N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden, & R. K. Hambleton 
Eds.), Handbook of modern item response theory (pp. 101-122). New York, Berlin, & Heidelberg: Springer. 


uthén, B. O., du Toit, S. H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic 
estimating equations in latent variable modeling with categorical outcomes. Unpublished manuscript available 
as MPLUS webnote. Retrieved from http://www.statmodel.com/bmuthen/articles/Article_O75.pdf 
uthén, L. K., & Muthén, B. O. (2012). Mplus: Statistical analysis with latent variables. User's guide. (Version 
7). Los Angeles, CA: Muthén & Muthén. 

unnally, J. C., & Bernstein, |. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill. 
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: 
ielsen & Lydiche. 


Schulz, W. (2009). Questionnaire construct validation in the International Civic and Citizenship Education 
Study. IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, vol. 2, 113-135. 
Schulz, W. (2017). Scaling of questionnaire data in large-scale assessments. In P. Lietz, J. Cresswell, K. Rust, 
& R. Adams (Eds.), Implementation of large scale education assessments (pp. 384-410). Chichester, UK: John 
Wiley & Sons, Ltd. 

Schulz, W., & Friedman, T. (2011). Scaling procedures for ICCS questionnaire items. In W. Schulz, J. Ainley, 
& J. Fraillon (Eds.), [CCS 2009 technical report (pp. 157-259). Amsterdam, The Netherlands: International 
Association for the Evaluation of Educational Achievement (IEA). 


Schulz, W., & Friedman, T. (2015). Scaling procedures for ICILS questionnaire items. In J. Fraillon, W. Schulz, 
T. Friedman, J. Ainley, & E. Gebhardt (Eds.), International Computer and Literacy Information Study 2013 
technical report (pp. 177-220). Amsterdam, The Netherlands: International Association for the Evaluation 
of Educational Achievement (IEA). 


SCALING PROCEDURES FOR ICILS 2018 QUESTIONNAIRE ITEMS 219 


Schulz, W., & Friedman, T. (2018). Scaling procedures for ICCS 2016 questionnaire items. In W. Schulz, 
R. Carstens, B. Losito, & J. Fraillon (Eds.), ICCS 2016 technical report (pp. 139-243). Amsterdam, the 
Netherlands: International Association for the Evaluation of Educational Achievement (IEA). 

Tucker, L., & MacCallum, R. (1997). Exploratory factor analysis. Unpublished manuscript. Retrieved from: 
http://www.unc.edu/~rcm/book/factornew.htm 

UNESCO. (2012). International Standard Classification of Education: |SCED 2011. Montreal, Canada: 
UNESCO-UIS. 
Warm, T.A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 
427-450. 


220 ICILS 2018 TECHNICAL REPORT 


CHAPTER 13: 
The reporting of ICILS 2018 results 


Wolfram Schulz 


Overview 


This chapter describes the procedures used for reporting results in the ICILS 2018 international 


report. It illustrates the replication meth 


comp 
thinki 
for di 
from 


This c 
questi 


onnaire 


and how we estimated multilevel (hierarchical) models explaining variation i 


hapter also explains how we conduc 


od used to estimate sampling varianc 


uted the imputation variance of the computer and information literacy (CIL) andcompu 
ng (CT) scores. Inthe subsequent section, we describe how we condu 
fferences between country and subg 
CILS 2018 and 2013 for the four co 


roup means or percentages, as well as b 
untries that participated in both 


ted multiple regression analyses to expl 
scale scores reflecting teachers’ emphasis on teaching CIL- and CT-re 


Estimation of sampling variance 


As inthe previ 


ous survey, ICILS 2018 em 


cted sign 


surveys. 


nstuden 


e, and how we 
tational 
ce tests 
results 


ifican 


etween 


ain variation in 
lated ICT skills, 
ts’ClL and CT. 


ployed two-stage cluster sampling procedures to obtain 


the student and teacher samples. During the first stage, schools were sampled from a sampling 


frame with a probability propor 
the second stage, students at the targe 


classrooms. Cl 


because these samples are not si 


more homoge 
obtaining stan 


Replication (re-sampling) techniq 


tional to their size (see Chapter 6 
t grade were randomly sam 
ques permit an efficient and econon 
e random samples and individua 
whole population, it is not appro 
g error for population estimates 


pled within 
uster sampling techni 
s within clus 
priate to app 


for simple ra 


the 
plin 


nous compared to 
dard errors of sam 


ues provide tools to estimate the sampling varianc 


estimates mo 
the 
tec 
and any other 


Generally, the 


survey, schools—in 


sampling zone 
construc 
within an 


correspo 
numbers 


randomly split 


previous cycle in 2013 
hnique to compute stand 


ted sampli 
explicit s 
two halves, thereby forming a samp 


Each of the countries participating 
nding to the unpaired sam 
of schools, we combined some schools into bigger “pseudo-schoo 
the total number to 75. If a selected school was large enough to be selected with certainty, it was 


tely (Gonzalez and Foy 2000; Wolter 1985). For ICI 
(see Schulz 2015), we used the jackknife repeated re 
arderrors for population means, percentages, regressi 
population statistic. 


JRR 


re appropria 


method for stratified samples requires pairing primary samplin 
to sampling zones (or “pseudo-strata”). Because assignment of s 
s needs to be consistent with the sampling frame from which they we 


tratum or the sampling frame, we randomly divided the remain 


ing zone of two “quasi-schools.’ 


in ICILS 2018 had up to 75 sampling zon 


es (s 
ple size of 150 schools. In countries where 
s’ in 


into two halves, which were paired. In Luxembourg, where a census 


ng zones within explicit strata. Whenever we found an odd number of 


for further details). During 


schools across 


nic data collection. However, 


ters tend to be 
y formulae for 
ndom samples. 


e of population 
LS 2018, as in 
plication (JRR) 
on coefficients, 


g units—in this 
chools to these 
pled, we 
schools 
ool into 


resam 


ing sch 


ee Table 13.1), 
we had larger 
order to keep 


of schools was 


surveyed, each of the 38 participating schools constituted a sampling zone and students were 
randomly assigned to two “quasi-schools.” 


222 


Table 13.1: Number of jackknife zones in national samples 


ICILS 2018 TECHNICAL REPORT 


Country Student data Teacher data School data 
Chile 75 15 75 
Denmark 72 69 72 
Finland 74 73 73 
France 75 65 75 
Germany 75 75 75 
Italy 75 74 73 
Kazakhstan 75 75 75 
Korea, Republic of 75 74 75 
Luxembourg 38 28 35 
Moscow (Russian Federation) 75 75 75 
North Rhine-Westphalia (Germany) 55 54 56 
Portugal 75 75 75 
United States VAs) 75 75 
Uruguay 75 62 ies) 


Note: Benchmarking pa 


Within each of the sa 
the other school ava 


rticipants in italics. 


ue of O. For each of 


mpling zones, we ran 


the sampling zo 


domly assigned one school a multiplication value of 2 and 


nes, we computed replicate weights. This 


meant that one of the paired schools had a contribution of O, the second a double contribution, and 


all other schools remained the same. Rep 


teacher, or school weights with the jackkn 
while keeping student, teacher, or schoo 


This process results i 


13.2 illustrates this procedure 


schools (A-F) paired 


For each country sample, we co 


countries with fewer 


weight and therefore did not contrib 


weights for the origi 


sampling variance SV,, estimate is con 


75 4 
SV)= 5 Ubi Hs 


i= 


Estimating the sampling variance for 


nareplicate weight 


into three jackknife 


ife’s multiplicat 
weights for all 


being added to 


ZONES. 


licate weights are derived by simply multiplying student, 


on value for the respective sampling zone 
other zones unchanged. 


the data file for each jackknife zone. Table 


through a simple example featuring 24 students from six different 


mputed 75 replicate weights regardless of the number of zones. In 


zones, the remaining replicate weights were set equal to the original sampling 


ute to the sampling variance estimate. 


a statistic, pl, involves computing it once with the sampling 


nal sample and then with each of the 75 replication weights separately. The 


nputed using the formula: 


Here, is the statistic estimated for the population through use of the original sampling weights 
and pi, is the same statistic estimated by using the weights for the i of 75 jackknife replicates. The 
standard error SE, for statistic u, which reflects the uncertainty of the estimate due to sampling, 


iS computed as: 


SE, =VSV, 


The computation of sampling variance using jackknife repeated replication can be obtained for any 
statistic, including means, percentages, standard deviations, correlations, regression coefficients, 


and mean difference 


5. 


THE REPORTING OF ICILS 2018 RESULTS 223 
Table 13.2: Example for computation of replicate weights 
ID Student School Jackknife Jackknife | Multiplication} Replicate Replicate Replicate 
weight zone replicate code value weight 1 weight 2 weight 3 
1 5.2 A O O O ee 5.2 
2 5.2 A O O O 5.2 5.2 
3 5.2 A 1 O O O 5.2 52 
4 52 A O O O Die 5.2 
5 9.8 B a 2 19.6 9.8 9.8 
6 9.8 B 1 1 2 NON) 98 9.8 
7 9.8 B 1 2 19.6 98 9.8 
8 9.8 B 1 2 19.6 9.8 9.8 
9 6.6 Cc 2. 1 2 6.6 AGL 6.6 
10 6.6 Cc 2 iL 2 6.6 2 6.6 
1 6.6 Cc 2 iL 2 6.6 a2 6.6 
2 6.6 G 2 iL 2 6.6 AS 6.6 
3 7.2 D 2 O O Ia O 7.2 
4 7.2 D 2 O O 7.2 O 7.2 
5 72 D 2 O O LD O 72 
6 2. D 2 O O 72 O 72 
7 49 E 3 al 2 49 49 9.8 
8 49 E 3 ol 2 49 49 98 
7 49 E 3 L 2 49 49 98 
20 49 E 3 1 2 49 49 9.8 
21 8.2 F 3 O O 8.2 8.2 O 
22 8.2 F 3 O O 8.2 8.2 O 
23 8.2 F 3 O O 8.2 8.2 O 
24 8.2 F 3 O O 8.2 8.2 O 


Standard statistical software do n 
ICILS 2018, we used tailored Stati 


(IBM Corp 2017). These results can be 


Analyzer, which is generally recommend 
can use other specialized software, such as WesVar (Westat 2007), tailored appli 
SPSS Replicates Module developed by AC 
as Stata (StataCorp 2013) or SAS (SAS 


ER,? or procedures bui 
nstitute Inc. 2017). 


ot always include procedures for replication 
stical Package for the Social Sciences (SPSS) software macros 
replicated by using the | 
ed as atool for analyzing | 


EA International 


techniques. For 


Database (IDB) 
EA data.! Alternatively, analysts 


cations like the 
tin to statistical software such 


1 The !DB Analyzer is an application that allows the user to combine and analyze data from IEA's large-scale assessments 


such as TIMSS, PIRLS, ICCS, and ICILS by creating SPSS or SAS Syntax, which can be used with the respecting statistic 
software. The application can be downloaded at https://www.iea.n|/data-tools/tools. 


2 The module is an add-in component running under SPSS and offers a number of features for applying different replica- 
tion methods when estimating sampling and imputation variance. The application can be downloaded at https://iccs. 


acer.org/iccs-2016-reports/. 


224 


ICILS 2018 TECHNICAL REPORT 


Estimation of imputation variance for CIL and CT scores 


The estimation of sampling variance as described above is sufficient for any analysis not involving 
test scores for ClL or CT. When estimating standard errors of estimates involving test scores for 
ClLand CT, it is important to additionally take the imputation variance into account, which provides 
an estimate of measurement variance (see Chapter 11 for a description of the scaling methodology 
for ICILS 2018 test items). Therefore, population statistics and their errors for ICILS 2018 CIL 
and CT scores should always be estimated using all five plausible values. 


If Bis the international ClLor CT score and uf is the statistic of interest computed on each plausible 
value P, then the statistic p, based on all plausible values can be computed as follows: 


The sampling variance SV,, is calculated as the average of the sampling variance for each plausible 
value SV/,: 


de P 
SV, = P2, SV), 


Use of the P plausible values for data analysis also allows an estimation of the amount of error 
associated with the measurement of ClL and CT. The measurement variance or imputation variance 
V,, is computed as: 

1 P 
aes 2, Wate)? 
Here, 1/8 is the statistic of interest computed on each plausible value p and i, is the mean statistic 
based on all P plausible values. 


The estimate of the total variance TV,,, consisting of sampling variance and imputation variance, 
can be computed as: 


TV,,=SV,,+ 14+ 5)IV, 


The estimate of the final standard error SE, is equal to: 
SE, =vTV, 


The following formula illustrates the whole process of the computation of standard errors for a 
statistic (u) based on P plausible values for 75 replicates, where ue is the statistic for the i replicate 
of the pth plausible value, and py), is the statistic for the p” plausible value with the original sampling 
weights: 


Table 13.3 shows the average scale scores for CIL as well as their sampling and overall standard 
errors, while Table 13.4 displays the corresponding information for CT. The tables also record 
the number of students that were assessed in each country. The comparison between sampling 
and combined standard error shows that for both assessment domains most of the error was 
due to sampling and that (at the level of national samples) only a relatively small proportion was 
attributable to measurement error. 


THE REPORTING OF ICILS 2018 RESULTS 


Table 13.3: National averages for CIL with standard deviations, sampling, and overall errors 


Country Average Sampling Combined Number of 
ele error standard assessed 
score error students 
Chile 476 3.71 3.73 3092 
Denmark 553 2.00 2.04 2404 
Finland 531 2.96 2.98 2546 
France 499 2.26 DBS) 2940 
Germany 518 2.81 2.95 3655 
taly 461 2.69 DIS) 2810 
azakhstan 395 5.33 DoT 3371 
Korea, Republic of 542 2.95 B05 2375 
Luxembourg 482 0.68 0.83 5401 
Moscow (Russian Federation) 549 2271 225 2852 
North Rhine-Westphalia (Germany) 515 2.55 2.63 1991 
Portugal 516 DSS 2U5O) S22) 
United States* 519 1.86 1.89 6790 
Uruguay 450 4.19 4.29 Pons 
Notes: Benchmarking participants in italics. 
* Countries not meeting sample participation requirements. 
Table 13.4: National averages for CT with standard deviations, sampling, and overall errors 
Country Average Sampling Combined Number of 
CT error standard assessed 
score error students 
Denmark 527 2.29 2.32 2404 
Finland 508 Boe BS 2546 
France 501 2.31 2.38 2940 
Germany 486 3.54 3.63 S655) 
orea, Republic of 536 4.26 4.42 2875 
Luxembourg 460 0.86 0.89 5401 
North Rhine-Westphalia (Germany) 485 2.87 2.96 1991 
Portugal 482 2.48 Pasall S224 
United States* 498 2.48 2.54 6790 


Notes: Benchmarking participants in italics. 


* Countries not meeting sample participa 


Reporting of differences 


ion requirements. 


Differences in population estimates between and within countries 


We considered differences between two score averages (or percentages) a and b significant (p < 
0.05) when the test statistic t was greater than the critical value, 1.96. We calculated t by dividing 


the difference by its standard error, SE gi¢ ap: 


(a-b) 


t = 
SE ai ab 


225 


226 


ICILS 2018 TECHNICAL REPORT 


Inthe case of differences between score averages from independent samples (evident, for example, 
with respect to comparisons of two country averages), the standard error of the difference SE gig ab 
can be computed as: 


SE gif ab = WSE% + SES 


Here, SEg and SEp are the standard errors of the means from the two independent samples a and b. 


The formula for calculating the standard error provided above is only suitable when the subsamples 
being compared are independent. Because subgroups (e.g., gender groups) within countries are 
typically not independent samples, we derived the difference between statistics for subgroups 
of interest and the standard error of the difference by using jackknife repeated replication that 
involved the following formula: 


75 
SE aig ab -| bs (a'-b)-(a-b))* 


i= 
Here, a and b represent the averages (or percentages) in each of the two subgroups for the fully 
weighted sample, and a! and b' are those for the replicate samples. 


Inthe case of differences in ClL and CT scores between dependent subsamples, we calculated the 
standard error of the differences with (P = 5) plausible values by using this formula: 


| > ( Fa'-b) 1, 29h (la,-b,)-(G,-,)? 
SE diab = EZ { $(a,-bi)-(a,-b,) lag (1+4) ee 


\ 


Here, a, and b, represent the weighted subgroup averages in groups a and b for each of the P 
plausible values, a’ and p, are the subgroup averages within replicate samples for each of the P 
plausible values, and a, and b, are the means of the two weighted subgroup averages across the 
P plausible values. 


Comparisons between countries and ICILS averages 


The standard error of the ICILS 2018 average SE, 291g was calculated based on the respective 
standard error for each of the national statistics (SE,) and the number (N) of countries meeting 
IEA sample participation requirements that were included in the average (11 for student survey 
results, 7 for teacher survey results): 


D1 SEZ 
SEu 2018 = ra ‘ 


When comparing the country means c with the overall ICILS 2018 average i, we had to account 
for the fact that the country being considered had contributed to the international standard error. 
We did this by calculating the standard error SEgig j. of the difference between the overall ICILS 
2018 average and an individual country average as: 


SE dit ic = Vi Nea W)PoASe to oe 
N 


Here, SE, is the sampling standard error for country c and SE, is the sampling error for k‘" of 
N participating and reported countries. We used this formula for determining the statistical 
significance of differences due to sampling error between countries and the ICILS 2018 averages 
of all questionnaire percentages or scale point averages throughout the ICILS 2018 reports. 


THE REPORTING OF ICILS 2018 RESULTS 227 


When comparing the CIL and CT score averages of a country with the overall |CILS 2018 average, 
it was necessary to also account for the imputation component of standard errors for countries 
into account. The imputation variance component of standard errors BE, difien was given as: 


il 
SE? dicey = of 14 B) VOM A dnd 


Here, qd. is the difference between the overall ICILS 2018 average and the country mean for the 
plausible value p. 


The sampling error for ClL and CT scores was calculated as follows: 


= 


1 
I; > pest SE] + yet a SE3,| 
N 


feos 


SE dig icp = 
Here, SE, is the sampling standard error for country c and plausible value p, and SE;,, is the sampling 


error for plausible value p in the k* of N participating and reported countries. 


We computed the final standard error (SE gi icp) of the difference between national CIL and CT 
country test scores and the ICILS 2018 averages as: 


SE a dif icp =| SE’ sie ict SES. ait ton 


Comparisons between benchmarking participants and ICILS averages 


When comparing averages for benchmarking participants, Moscow (Russian Federation) 
constituted an independent student sample in relation to the ICILS 2018 average while North 
Rhine-Westphalia was a sub-sample of the German national student sample (representing 
22.5% of the corresponding student population). Therefore, two different formulas had to be 
applied. For the teacher survey, the German teacher survey did not meet IEA sample participation 
requirements and was therefore not included inthe ICILS 2018 average for results from the teacher 
survey. Therefore, North Rhine-Westphalia, which had met sample participation requirements 
as a benchmarking participant, constituted an independent sample in relation to the ICILS 2018 
average for teacher results. 


Standard errors for differences between ICILS 2018 averages and Moscow (Russian Federation), 
both for student and teacher results, as well as North Rhine-Westphalia (Germany) with regard 
to teacher results were computed with the following formula: 


SE git ic = «f SEbp + SE{, 2018 


Here, SEp, is the standard error for the result from the benchmarking participant, and SE,, 2018 IS 
the standard error for the ICILS 2018 average. 


For differences between student (or school) survey results in North Rhine-Westphalia (Germany) 
and the ICILS 2018 average, the standard error SE gig straw Was Computed as follows: 


= : J (N-1)2-0.225)SE2 NRW D5e1SE2+ TN, +SEy 
dif_StNRW. N 


Here, SEcipw represents the standard error for student survey estimate from North Rhine- 
Westphalia and SE, is the sampling error for k“ of N participating countries. The correction for the 
contribution of the data from this benchmarking participant was set to 0.225 (instead of 1) as its 
student population was equivalent to 22.5 percent of the overall German population. 


228 


ICILS 2018 TECHNICAL REPORT 


Comparisons between CIL results in 2018 and 2013 


For those countries that had participated in the previous cycle, the ICILS 2018 internationa 
report also included comparisons of test results for CIL between ICILS 2013 and 2018. Because 
the process of equating the ClL scores across the cycles introduced some additional error into the 
calculation of any test statistic, we added an equating error term to the formula for the standard 
error of the difference between country averages. 


When testing the difference of a statistic between the two assessments, we computed the standard 
error of the difference as follows: 


SE(Wacits1g) ~ Hiicits13)) = F SEnicisiet SE Se2picusig+ EqErr? 


Here, can be any statistic in units on the equated ICILS scale (mean, percentile, gender difference, 
but not percentages) and SE, icis1g and SE,iciisig are the respective standard errors of this statistic 
from the two surveys. EgErr denotes the equating error that reflects the uncertainty in the link 
between both assessments, which was equal to 3.9 score points for the CIL scale (see Chapter 11 
for the calculation of the respective equating error). 


To report the significance of differences between ICILS 2018 and 2013 for percentages of students 
with CIL scores at or above Level 2 (see Chapter 1 for a description of these levels), it was not 
possible to use the estimated equating error in ClL score points. Therefore, we applied the following 
replication method to estimate the corresponding equating errors. 


between Levels 1 and 2 (492), within each participating country anumber of nreplicate cut-point 
were generated by computing the observed threshold plus arandom error component with amea 
of O and astandard deviation equal to the estimated equating error (3.9). Percentages of student 
at or above each replicate cut-point (p_) were computed for each of these replicated thresholds 


n 


and the equating error for each participating country was estimated as: 


To estimate the standard error of the percentage at or above the cut-point that defines the threshold 
n 


S 
S 


(P,P. 
EquErr p_country — N = 


Here, p, is the observed percentage of students at or above Level 2. We used 1000 replicate 
samples (Nop) for these computations. The equating errors for the national percentages of students 
at or above Level 2 were estimated as 1.95 for Chile, 1.23 for Denmark, 1.45 for Germany, and 
1.18 for Korea. 


Within each participating country, the standard errors for the differences between percentages 
at or above proficient levels were calculated based on the standard error on the percentage at 
or above Level 2 in 2018 (SEpicitsi8); the standard error for the corresponding estimate in 2013 
(SEpicits13), and the estimated equating error (EquError country) as follows: 


SE(Pucis1s)~ Pucis13) = ¥ SEpicusiet SEpicusiat EQUENTpcountry 


Multiple regression modeling of teacher data 


When reporting ICILS 2018 data, we also used single-level multiple regression models to explain 
variation in the questionnaire scale scores reflecting teachers’ emphasis on teaching CIL and 
CT, respectively. Predictor variables were teachers’ ICT self-efficacy, positive perceptions of 
pedagogical ICT use, perceptions of higher levels of teacher collaboration, and reports on higher 
levels of availability of ICT resources at their school and teachers’ experience with using ICT 
during lessons. 


THE REPORTING OF ICILS 2018 RESULTS 


Multiple regression models (see, for example, Pedhazur 1997) were estimated as: 


Y¥i= Bot BX7+ 8; 


Here, for sample of i teachers we regressed our criterio 


n variables Y; (teachers’ emphasis on 


teaching CIL or CT) on a vector of predictors X/ with its corresponding vector of regression 


coefficients B;, where Bo denotes the intercept, and ¢; re 


model (residual). 


We reported the unstandardized regression coefficients and the variance explained by the mode 
r and the overall explanatory power of the model. To estimate 


to show the effects for each predicto 


standard errors for the mu 


Table 13.5 shows the num 


ltiple reg 


bers of al 


each of the multiple regression anal 


valid data for all variables in each of 


the two estimated models. 


ression model parameters, we employed jackknife repeated 
replication using tailored SPSS macros, which can be exactly replicated with the IEA IDB Analyzer. 


assessed teachers in each country, of teachers included in 
yses, as well as the weighted percentages of teachers with 


presents the unexplained part of the 


Table 13.5: ICILS 2018 teachers included in multiple regression analyses of teachers’ emphasis on 
teaching CIL- and CT-related skills 


Country Multiple regression analysis of Multiple regression analysis of 
teachers' emphasis on teachers' emphasis on 
ClLrelated skills CT-related skills 
Total number Weighted Total number Weighted 
of teachers percentage of teachers percentage 
in analysis of teachers in analysis of students 
in analysis in analysis 
Chile 1625 96 1622 96 
Denmark 1081 Oi 1078 A 
Finland 1797 97 1795 97 
France 1405 96 1400 96 
Germany 2212 94 2207 94 
taly 2534 OH AZAD Oi 
azakhstan 2534 96 2535 96 
orea, Republic of 2068 97 2048 96 
Luxembourg 473 96 472 96 
Moscow (Russian Federation) 2189 97 2185 97 
North Rhine-Westphalia (Germany) 1398 94 1400 95 
Portugal 2743 OH, 2750 98 
United States* 3103 96 3103 96 
Uruguay 1178 90 1176 90 
Notes: Benchmarking participants in italics. 
* Countries not meeting sample participation requirements. 


Across the participating countries, we observed an average percentage of teachers in the sample 
with valid data for all variables of 96 percent. National average percentages of teachers with valid 
data for all variables ranged from 90 percent in Uruguay to 98 percent in Portugal.* 


3 Readers should note that when applying models with a larger number of predictor variables, it is likely that the pro- 
portion of missing values increases when applying a (list-wise) exclusion of respondents with omitted data for any of 
the variables in the analysis. 


ICILS 2018 TECHNICAL REPORT 


Hierarchical linear modeling to explain variation in students’ CIL and CT 


To review which factors are associated with variation in CIL and CT within and across schools 
within participating countries, we estimated within each country hierarchical (or multilevel) linear 
regression models (Raudenbush and Bryk 2002) in which students were nested within schools. 
Predictor variables included variables reflecting students’ personal and social background, ICT- 
related variables at the student level, and |CT-related factors at the school level. 


A hierarchical regression model with i students nested inj clusters (schools) can be estimated as: 
Yii= Bot ByXij + BX} + Ug +e; 
Here, Yi is the criterion variable, Bo is the intercept, Xj is a vector of student-level variables, with 
its corresponding vector of regression coefficients Bj, and Xmj is a school-level variable with its 
corresponding vector of regression coefficients Bj. Ugjis the residual term at the level of the cluster 


(school), and e; is the student-level residual. Both residual terms are assumed to have a mean of O 
and variance that is normally distributed at each level. 


The explained variance in hierarchical linear models has to be estimated for each level separately, 
with the estimate based on a comparison of each prediction model with the baseline (“null”) model 
(or ANOVA model) without any predictor variables. 


We estimated the null model, from which we excluded students with missing data after completing 
“missing treatment” (see section on missing treatment below) as: 


Yig= Bo + Yoynuiy * €i (nul) 
The residual term Upjy,,) Provides an estimate of the variance in Yj; between j clusters, and €j _,)) 
is an estimate of the variance between i students within clusters. The intra-class correlation IC, 
which reflects the proportion of variance between clusters (in our case, schools), can be computed 
from these estimates as: 


Upj (null) 


IC= 
Yojnuty + 8 ( 


null) null) 


Based on the estimates of variances at school and student level derived from the null model, we 
computed the explained variance at the school level EV; as: 


Uoj 


ih x100 
Upj 


EV,= 


null) 

We computed the explained variance at the student level EV; as: 
ej 

© ij (null) 


EVii = x100 


ie 


Because multilevel modeling takes the hierarchical structure of the cluster sample into account 
and we used plausible values as estimates of students’ CIL and CT in our models, the reported 
multilevel standard errors reflect both sampling and imputation errors. 


National data were weighted with normalized school-level and [within-school] student-level 
weights, where the sum of within- and between-school weights is equal to the sample size 
(Asparouhov 2006). Normalized (or scaled) within-school student-level weights wj were calculated 
as: 


THE REPORTING OF ICILS 2018 RESULTS 231 


Here, nj represents the cluster size (equal to the number of students with valid data within each 
school) and ow, the original within-school student-level weights. Normalized (or scaled) school-level 
weights wjwere computed as: 


n 
dij Oj Wj 


ow? 


jj 


Here, nis the total sample size and w; denotes the original school level weights. 


uthén and Muthén 2012) to estimate all 
the sample participation requirements 


We used the software package MPlus (Version 7, see 
hierarchical models. Even though Luxembourg had met 
for the student survey and 38 out of 41 of its schools with target grade students participated, we 
excluded their data from the analyses; the low number of schools would have resulted in a greatly 
reduced statistical power and arather limited precision of estimation of school level effects. Results 
from the United States, which did not meet IEA sample participation requirement, were reported 
separately and should be interpreted with caution. 


As is customary when applying multivariate analyses, we observed increases in the proportions 
of missing data when including more variables in the model. To account for higher proportions of 
missing responses for some of the variables with higher percentages of missing values, the analysis 
included a “dummy variable adjustment” for these data (see Cohen and Cohen 1975). For each of 
ese variables, we assigned mean or median values to cases with missing data and added dummy 
indicator variables (with 1 indicating a missing value and O non-missing values) to the analysis. 


er proportions of missing values for the scale reflecting use 
d the scales measuring students’ perceptions of learning of 
L- or CT-related skills (which were included as predictor variables for explaining CIL and CT 
respectively). Given that information from teachers tended to be either not missing or missing (in 
almost all cases) for both predictor variables (average years of teachers’ experience with ICT use 
during lessons and teachers’ reports on students’ ICT use for class activities) from this survey, only 
one missing indicator was created to indicate missing teacher data for both variables. Two further 
missing indicators were used for missing school principal data regarding schools’ expectations of 
teacher communication via ICT, and for missing ICT coordinator data about the availability of ICT 
resources at school. 


Table 13.6 shows the coeft 
and CT in each country. St 


t the student level we observed high 
f general ICT applications in class an 


ficients for missing indicators included in the multilevel analyses of CIL 
udent-level missing indicators tended to be negatively related with CIL 
and CT respectively while patterns were less consistent for school-level predictors derived from 
teacher, principal, and ICT coordinator surveys. In countries where there were no schools with 
missing information from either school or teacher questionnaires, it was not necessary to apply 
any treatment. Consequently, no missing indicator coefficients are displayed in this table (as in 
Korea for school principal and ICT coordinator questionnaire data, in Kazakhstan for teacher 
information, and in Moscow for all school-level data). 


Table 13.7 shows 
respective weighte 


uded in the multilevel analyses, as well as the 


th valid data for all variables in the model. 


the numbers of students inc 
d percentages of students wi 


evela tuden 8 countries 


For th 
meeti 
Germ 


validd 
with d 


emulti 
ng samp 
any and 
atafor | 


ue cauti 


alyses of CIL, 92 percent of s 


nclusi 
on. For the analysis of variation in CT, on 


data for all variables in the model. However, in German 


could be included in these analyses with similar caveats t 


the respective results. 


ts, onaverage across |ICILS 201 
e participation requirements, had valid da 
Uruguay, however, only 84 and 83 percent 
oninthe analyses of CIL, and conseq 


he model. In 
sample had 


ta for all variables included int 
(respectively) of the weighted 
uently their results should be interpreted 
average 94 percent of students had valid 
y, only 84 percent of the weighted sample 


hat should be observed when interpreting 


ICILS 2018 TECHNICAL REPORT 


232 


aGeleAe UO!EWIOJU! |[N4 = Y/N ‘SJUaWaIINbaI UO!Jed!oq4ed a}dw 


“solJO}! UL Sued! 


eS BUIJIAW JOU SAIIJUNOD , 
qed Buljsewysuag :Sd}ON 


(SET) CE (STL) 69- (82) S'€e- CR) Eee Aensnin 
(V9T) @es- | (LTT) L6z- | (@et) eet (0's) 89 (£6) SZT- | (8'0T) €9- (69) 9ZI- | (LOT) 8°62- (SOT) TSb- | (OOT) €'9%- 4897245 poyUuN 
(VOT) 88 |(€OT) SvT- | (SOT) BZe- | (O7T) FO- (GET) BZ | (VT?) G9Z- | (HST) 6ST- | (99T) OE (GZ?) ELS- | (CLT) TT9- [e3ny40q 
(AUDA) 
(L9T) 9T- | (LZ8T) Zt | (€@T) LST (SOT) ©9 (TTZ) O'ZLE- | (LTZ) 69S- (62) v8 (LTT) @9€- (CET) SSZ- | (ZT) T8- DIIDYAISaN\-AUIYY YLION 
(UO|bJapa 4 UbISSNY) 
Y/N Y/N Y/N Y/N Y/N Y/N (BCT) 99T- (vO?) Tes- MODSO|\| 
(v9) TT- | (VET) BO (9) 8ST (SZT) LT (VL) 6T- (601) 6€ ($9) Lyt- | (G9) 97E- (68) TO9- | (79) OTP s.unoquiaxny 
(T'0¢) SOT | (Zvt) vt Y/N Y/N Y/N Y/N (OTS) GZ- | (Lv8) OE (88S) SZvT-| (O19) L8r- Jo a1\qnday ‘easo 
Y/N (Tye) €Te- (CT?) v9 (662) S°SS- (v'%Z) O'EE- ueysyyeze 
(V9T) OTOT- (LST) 8'v- (£6) LO- (GLI) 6 6b- (902) 0'0S- A\e} 
(56) 99T- | (VEZ) ETZ- | (6VT) BET- | (ZT) 67 (LET) Tr (O'6T) OT (TZ) LS (96) ©8 (ETZ) 9'8E- | (TST) T8z- Aueula 
(Sensi (69) 99T- (68) TO (£9) TO- (Sa) Sie || (Won) ve (V8) Z8Z- | (OT) 9SE- | (EOL) vb | (76) TéE- eouel4 
(V9) OL (8ST) Zé (CL) O'9- (LTE) L69- (SSZ) @L8- puejul4 
(8ZT) Zr (ACG (Ss) Cte (08) 77% (58) OZ (O'ET) 7'E- (EZL) OLI- | (eZ) WST- | (G8T) S@b- | (O9T) TeE- Jewusq 
(VE) 78 (CET) T6- (8ST) 8'8- (OLT) 8'vZ- (VST) 9%E- alld 

ills) 19 ills) 119 alls) 119 alls) nll) 419 19 

LD] e1A uoHzes!UNWWLWUOS Ajaanoedsas sse|o Ul 

403e91pul [OOYUDs 3e SEdINOSAI | DI dayoea} JO Suoljze}Dadxa sulusea|-| D5 pue -]|D suoljesijdde | 5| je4auas 

e}ep Jaye} BUISSI\| JO} 410}€91PUI BUISSI|\| S,JOOUDS JO} 10}€D1PU! BUISSI|\\ 40} 410}€91PU! BUISSI|\| JO 9SN JOJ 1079! pu! BUISSI|\| Aayyunoz 


ppp [Dd pub 7D fo sISAjDUD [AAQ]IZ/NU Ul SJOZDII pul Sulssiw Jo SJUBIDYJa0D :9°ST afqvol 


THE REPORTING OF ICILS 2018 RESULTS 


Table 13.7: ICILS 2018 students included in multilevel analyses of variation in CIL and CT 


Country elle Gl 
Number of Weighted % Number of Weighted % 
students of students students of students 
in analysis in analysis in analysis in analysis 
Chile 2888 93 
Denmark DSA) Oy ZNO) Oy 
Finland 2453 97 2453 97 
France 2686 We 2686 92 
Germany 2983 84 2983 84 
Italy ONT Os 
Kazakhstan 3169 95 
Korea, Republic of 2786 97 2786 97 
Moscow (Russian Federation) 2718 95 
North Rhine-Westphalia (Germany) 1562 79 i Syo? 79 
Portugal 3081 96 3081 96 
United States* 5829 86 5829 86 
Uruguay 2126 83 


Notes: Benchmarking participants in italics. 


* Countries not meeting sample participation requirements. 


References 


Asparouhoy, T. (2006). General multilevel modeling with samplin 
Theory and Methods, 35(3), 439-460. 


Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation a 


J: Lawrence Erlbaum Associates. 


Los Angeles, CA: Muthén & Muthén. 


Worth, TX: Harcourt Brace College. 


ewbury Park, CA: Sage Publishers. 
Schulz, W. (2015 


lege, 


Wolter, K.M. (1985). Introduction to variance estimation. New York, NY: Springer. 


g weights. Communications in Statistics: 
alysis for the behavioral sciences. Hillsdale, 


Gonzalez, E.J., & Foy, P. (2000). Estimation of sampling variance. In M.O. Martin, K. D. Gregory, & S.E. Semler 
Eds.), TIMSS 1999: Technical report. Chestnut Hill, MA: Boston Co 


BM Corp. (2017). IBM SPSS Statistics for Windows, Version 25.0 [s 
uthén, L.K., & Muthén, B.O. (2012). Mplus: Statistical analysis with latent variables. User's guide. (Version 7). 


tatistical software]. Armonk, NY: Author. 


Pedhazur, E.J. (1997). Multiple regression in behavioral research: explanation and prediction (3rd ed.). Ford 
Raudenbush, SW., & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods. 


. Reporting of ICILS results. In J. Fraillon, W. Schulz, T. Friedman, J. Ainley, & E. Gebhardt, 
Eds.) International Computer and Literacy Information Study 2013 technical report (pp. 221-232). Amsterdam, 
The Netherlands: International Association for the Evaluation of Educational Achievement (IEA). SAS 
nstitute Inc. (2017). SAS/STAT®14.3 user’s guide [statistical software]. Cary, NC: Author. 

StataCorp. (2013). Stata user’s guide. Release 13 [statistical software]. College Station, TX: Author. 


Westat. (2007). WesVar ®4.3: User’s guide [computer software]. Rockville, MD: Author. 


233 


APPENDIX A: 


Organizations and individuals involved 
in ICILS 2018 


International study center 


T 
ag 


The international study center is located at the Australian Council for Educational Research (ACE 
ACER is responsible for designing and implementing the study in close cooperation with IEA. 


Staff at ACER 

Julian Fraillon, research director 

John Ainley, project coordinator 

Wolfram Schulz, assessment coordinator 

Tim Friedman, project researcher 

Daniel Duckworth, test developer 
elissa Hughes, test developer 

Laila Helou, quality assurer 

Alex Daraganov, data analyst 

Renee Kwong, data analyst 

Leigh Patterson, data analyst 

Louise Ockwell, data analyst 

Katja Bischof, project researcher 


International Association for the Evaluation of Educational Achievement (IEA) 


EA provided overall support in coordinating and implementing ICILS 2018. IEA Amsterdam, the 

etherlands, was responsible for membership, translation verification, quality control, and the 
publication and wider dissemination of the report. IEA Hamburg, Germany, was mainly responsible 
for managing field operations, sampling procedures, and data processing. 


Staff at IEA Amsterdam 


Dirk Hastedt, executive director 
Andrea Netten, director IEA Amsterdam 
Roel Burgers, financial director 
ichelle Djeki¢, research and liaison officer (former) 
Sandra Dohr, junior research officer 
David Ebbs, senior research officer 
Sive Finlay, head of communications (former) 
sabelle Gémin, senior financial officer 
irjam Govaerts, public relations and events officer 
Gina Lamprell, junior publications officer 
Jennifer Ross, media and outreach officer 
Jasmin Schiffer, graphic designer 
Jan-Philipp Wagner, junior research officer 
Gillian Wilson, senior publications officer 


236 ICILS 2018 TECHNICAL REPORT 


Staff at IEA Hamburg 

Juliane Hencke, director IEA Hamburg 

Heiko Sibberns, director IEA Hamburg (former) 

Ralph Carstens, senior research advisor 

Sebastian Meyer, ICILS international data manager 

Michael Jung, ICILS international data manager (former) 
Ekaterina Mikheeva, ICILS deputy international data manager 
Lars Borchert, ICILS deputy international data manager (former) 
Sabine Meinck, head of research and analysis, and sampling units 
Sabine Tieck, research analyst (sampling) 

Sabine Weber, research analyst (sampling) 

Karsten Penon, research analyst (sampling) 

Duygu Savascl, research analyst (sampling) 

Oriana Mora, research analyst 

Adeoye Oyekan, research analyst 

Hannah Kohler, research analyst 

Lorelia Lerps, research analyst 

Rea Car, research analyst 

Clara Beyer, research analyst 

Yasin Afana, research analyst 

Guido Martin, head of coding unit 

Katharina Sedelmayr, research analyst (coding) 

Deepti Kalamadi, programmer 

aike Junod, programmer 

Limiao Duan, programmer 

Devi Prasath, programmer (former) 

Bettina Wietzorek, meeting and seminar coordinator 


RM Results 


RM Results was responsible for developing the software systems underpinning the computer- 
based student assessment instruments for the main survey. This work included development of 
the test and questionnaire items, the assessment delivery system, and the web-based translation, 
scoring, and data-management modules. 


RM Results 
ike Janic, managing director 
Stephen Birchall, deputy CEO 
Erhan Halil, product development manager 
Rakshit Shingala, team leader 
James Liu, analyst programmer 

ilupuli Lunuwila, analyst programmer 
Richard Feng, analyst programmer 
Stephen Ainley, quality assurance 
Ranil Weerasinghe, quality assurance 
Grigory Loskutov, IT coordinator 


APPENDICES 237 


ICILS sampling referee 


Marc Joncas was the sampling referee for the study. He provided invaluable advice on all sampling- 
related aspects of the study. 


National research coordinators 


The national research coordinators played a crucial role in the development of the project. They 
provided policy- and content-oriented advice on the development of the instruments and were 
responsible for the implementation of ICILS in the participating countries. 


Chile 

Carolina Leyton 

Maria Victoria Martinez 

Tabita Nilo 

National Agency for Educational Quality 


Denmark 
Jeppe Bundsgaard 
Danish School of Education, Aarhus University 


Finland 
Kaisa Leino 
Finnish Institute for Educational Research, University of Jyvaskyla 


France 

Marion Le Cam 

Ministry of National Education 

Germany and North Rhine-Westphalia (Germany) 
Birgit Eickelmann 

Institute for Educational Science, University of Paderborn 


Italy 

Elisa Caponera 

Riccardo Pietracci 

INVALSI (Istituto Nazionale per la Valutazione del Sistema Educativo di Istruzione e di Formazione) 
Gemma De Sanctis (until May 2018) 

MIUR (Ministero dell’lstruzione, dell'Universita e della Ricerca) 

Kazakhstan 

Aigerim Zuyeva 

Ruslan Abrayev 

Department for International Comparative Studies, Ministry of Education and Science 


Luxembourg 
Catalina Lomos 
Luxembourg Institute of Socio-Economic Research (LISER) 


Moscow (Russian Federation) 
Elena Zozulia 
Moscow Center for Quality of Education 


Portugal 
Vanda Lourenco 
IAVE, |P—Institute of Educational Evaluation 


Republic of Korea 

Sangwook Park 

Kyongah Sang 

Korea Institute for Curriculum and Evaluation 


238 


United States 

Lydia Malley 

Linda Hamilton 

National Center for Education Statistics, US Department of Education 
Uruguay 

Cristobal Cobo 

Center for Research—Ceibal Foundation 


Cecilia Hughes 
Evaluation and Monitoring Department at Plan Ceibal 


ICILS 2018 TECHNICAL REPORT 


APPENDICES 239 


APPENDIX .B: 
Characteristics of national samples 


For each educational system participating in ICILS 2018, this appendix describes population 
coverage, exclusion categories, stratification variables, and any deviations from the general ICILS 
sampling design. 


The same sample of schools was selected for the student survey and the teacher survey. However, 
the school participation status of a school in the student and teacher survey can differ. It is 
particularly common that a school counts as participating in the student survey, but not in the 
teacher survey; however, the reverse scenario is also possible. If the school participation status 
in both parts of ICILS 2018 differs, the figures are displayed in two separate tables. If the status 
counts are identical in both parts, the results are displayed in one combined table. 


umbers in brackets refer to the number of categories of specific stratification variables. 


B.1 Chile 


e School level exclusions consisted of schools for children with special educational needs, very 
small schools (less than six students in the target grade) and geographically inaccessible 
schools. Within-school exclusions consisted of intellectually disabled students, functionally 
disabled students, and non-native language speakers. 


e Explicit stratification was performed by school type (grade 8 and 9, grade 8 only), school 
administration type (public, private-subsidized, private), and urbanization (rural, urban), 
resulting in 10 explicit strata. 


e Implicit stratification was applied by national assessment performance group for 
mathematics (four levels), giving a total of 36 implicit strata. 


e The sample was disproportionally allocated to explicit strata. 


e Schools were oversampled to allow for better estimates for private and rural schools. 


e Small schools were selected with equal probabilities. 


Table B.1.1: Allocation of student sample in Chile 


School participation status—student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Grade 8 & 9 - Public - Rural 4 0 4 0 O 0 

Grade 8 & 9 - Public - Urban 10 0 9 1 O O 

Grade 8 & 9 - Private subsidized 4 0 4 0 O O 

- Rural 

Grade 8 & 9 - Private subsidized 20 O 17 2 1 0 

— Urban 

Grade 8 & 9 - Private - 46 0 38 3 5 O 

Urban & rural 

Grade 8 only - Public - Rural 30 O 29 1 O 0 

Grade 8 only - Public - Urban 28 O 27 1 0 0 

Grade 8 only - Private subsidized 12 O 12 0 O 0 

- Rural 

Grade 8 only - Private subsidized 22 al 21 0 O 0 

— Urban 

Grade 8 only - Private - Urban 4 O 2 1 O 1 

Total 180 ul 163 9 6 1 

Note: No schools with student participation rate below 50% were found. 


240 


ICILS 2018 TECHNICAL REPORT 


Table B.1.2: Allocation of teacher sample in Chile 
School participation status—Teacher survey 
Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 
Grade 8 & 9 - Public - Rural 4 0 4 O O O 
Grade 8 & 9 - Public - Urban 10 0 9 O O 
Grade 8 & 9 - Private subsidized 4 0 4 O 0 O 
- Rural 
Grade 8 & 9 - Private subsidized 20 0 17 1 1 
~ Urban 
Grade 8 & 9 - Private - Urban & 46 0 37 3 5 
rura 
Grade 8 only - Public - Rural 30 O 28 1 O 
Grade 8 only - Public - Urban 28 O 2/ sD O O 
Grade 8 only - Private subsidized 12 O 12 0 O 0 
- Rural 
Grade 8 only - Private subsidized 22 il 21 0 O O 
- Urban 
Grade 8 only - Private - Urban 4 2 1 O i 
Total 180 al 161 6 
Note: Four schools with teacher participation rate below 50% were found. 


B.2 Denmark 


e School level exclusions consisted of schools for c 
centers, schools with less than f 
Waldorf schools. Within-school exclusions c 


five students | 


functionally disabled students, and non-native 


e No explicit stratification was per 


formed. 


hildren with special education needs, treatment 
n the target grade, and German, English, and 
onsisted of intellectually disabled students, 
anguage speakers. 


e Implicit stratification was applied by assessment score, giving a total of five implicit strata. 


e Small schools were selected with 


Table B.2.1: Allocation of student sample in Denmark 


equal probabi 


lities. 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Denmark 150 O 114 25 4 7 

Total 150 ) 114 25 4 7 

Note: No schools with student participation rate below 50% were found. 
Table B.2.2: Allocation of teacher sample in Denmark 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Denmark 150 O 109 25 4 12 

Total 150 (6) 109 25 4 12 


Note: Two schools were regarded as non-participating because the within-school participation rate was below 50%. 


APPENDICES 241 
B.3 Finland 
e School level exclusions consisted of schools for children with special education needs and 
schools with instruction language not Finnish or Swedish. Within-school exclusions consisted 
of intellectually disabled students, functionally disabled students, and non-native language 
speakers. 
e Explicit stratification was performed by region (5), urbanization (urban, semi-urban, rural), 
and language (2), resulting in nine explicit strata. 
e Implicit stratification was applied by region (4), and urbanization (urban, semi-urban, rural), 
giving a total of 17 implicit strata. 
e School sample overlap between ICILS 2018 and TALIS 20181: Both samples were drawn at 
once using minimum overlap control. The samples were proportionally allocated to explicit 
strata. All schools have been selected with equal probabilities. 
Table B.3.1: Allocation of student sample in Finland 
School participation status—Student survey 
Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 
Helsinki/Uusimaa - Urban 36 O 35 0 O 1 0 
& semi-urban & rural 
Southern - Urban & 26 O 26 0 O O 0 
semi-urban 
Southern - Rural 6 O 6 0 O 0 
Western - Urban & 28 al 25 0 0 1 1 
semi-urban 
Western - Rural 6 O 5 0 O O 1 
Northern & Eastern - 28 al 26 1 O O 0 
Urban & semi-urban 
Northern & Eastern - 10 O 10 O O O 0 
Rura 
Swedish speaking — 8 O 8 0 O O 0 
No Aland 
Swedish speaking - Aland 2 O 2 0 O 0 
Total 150 2 143 1 0 2 2 
Note: No schools with student participation rate below 50% were found. 


1 ICILS 2018 was conducted in the same year as the OECD's Teaching and Learning International Survey (TALIS) 2018 
and Programme for International Student Assessment (PISA) 2018, and some national education surveys. The ICILS 
2018 sampling team collaborated closely with the staff implementing sampling for these studies to prevent school sam- 
ple overlap whenever possible. See Chapter 6 for further details. 


242 


Table B.3.2: Allocation of teacher sample in Finland 


ICILS 2018 TECHNICAL REPORT 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

Helsinki/Uusimaa - Urban 36 0 35 O 0 1 O 

& semi-urban & rural 

Southern - Urban & 26 0 26 O 0 0 O 

semi-urban 

Southern — Rural 6 0 6 0 0 0 O 

Western - Urban & 28 1 24 O 0 2 1 

semi-urban 

Western — Rural 6 0 Sy O 0 0 il 

Northern & Eastern - 28 il 26 1 O 0 O 

Urban & semi-urban 

Northern & Eastern - 10 0 10 O 0 0 O 

Rura 

Swedish speaking - 8 0 8 O 0 0 O 

No Aland 

Swedish speaking - Aland 2 eo) 2 O 0 O O 

Total 150 2 142 1 0 3 2 

Note: One school with teacher participation rate below 50% was found. 


B.4 France 


School level exclusions consisted of private schools without contract, schools in the overseas 


terri 
inte 
spea 
Exp 
pub 
10,0 
equi 


No implici 
School samp 
using mini 


ers. 


ic schoo 
OO to 200,000 i 


t stratifica 


mum over 


icit stratification 
priority education, 


pment, normal digital equi 


e overlap betwee 


nhabitants, 


ap contro 


functionally disabled s 


was performed by school type (pu 
private school), urbani 
more than 200,000 inh 
pment), resulting in 18 explicit strata. 


tion was applied. 


n ICILS 2018 and TALIS 2018: ICILS sample was selected 
to TALIS 2018. 


tories and in Mayotte, and specialized schools. Within-schoo 
ectually disabled students, 


tudents, and non-na 


blic schoo 
zation (less than 10,000 inhabitants, 
abitants), and equipment (large digital 


exclusions consisted of 
tive language 


on priority education, 


Schools with large digital equipment strata were oversampled to allow for better estimates. 


Small schools were selected with equal probabilities. 


APPENDICES 243 


Table B.4.1: Allocation of student sample in France 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public school - Non priority 9 O 9 O O 0 


education - Less than 10,000 
inhabitants - Large digital 


equipment 

Public school - Non priority 24 O 24 O 0 0 
education — Less than 10,000 

inhabitants - Normal digital 

equipment 

Public school - Non priority 8 O 8 O O O 
education - 10,000 to 200,000 

inhabitants - Large digital 

equipment 

Public school - Non priority 18 O 17 1 0 0 


education - 10,000 to 200,000 


inhabitants - Normal digital 

equipment 

Public school —- Non priority 12 O 12 0 O O 
education - More than 200,000 

inhabitants - Large digital 

equipment 

Public school - Non priority 19 O 19 O O 0 


education - More than 200,000 


inhabitants - Normal digital 
equipment 
Public school - Priority education 4 O 4 O O 0 


— Less than 10,000 inhabitants - 
Large digital equipment 


Public school - Priority education 4 O 4 O O 0 
— Less than 10,000 inhabitants - 
ormal digital equipment 


Public school - Priority education 4 O 4 0 O 0 
— 10,000 to 200,000 inhabitants 
- Large digital equipment 


Public school - Priority education 6 O 6 O O 0 
— 10,000 to 200,000 inhabitants 
ormal digital equipment 


Public school - Priority education 5 O > 0 O O 
ore than 200,000 inhabitants 

- Large digital equipment 

Public school - Priority education 8 O 8 O 0 0 

— More than 200,000 inhabitants 

- Normal digital equipment 


Private school - Less than 10,000 4 O 4 O O 0 
inhabitants - Large digital 

equipment 

Private school — Less than 10,000 6 0 6 O 0 0 
inhabitants —- Normal digita 

equipment 

Private school - 10,000 to 4 O 4 O 0 0 
200,000 inhabitants - Large 

digital equipment 


244 ICILS 2018 TECHNICAL REPORT 


Table B.4.1: Allocation of student sample in France (contd.) 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Private school - 10,000 to 8 O 8 0) 6) O 


200,000 inhabitants — normal 
digital equipment 
Private school - More than 4 O 4 O O O 
200,000 inhabitants - Large 
digital equipment 
Private school - More than 9 O 9 0 O O 
200,000 inhabitants - Normal 
digital equipment 
Total 156 0) 155 1 0 0 


Note: No schools with student participation rate below 50% were found. 


Table B.4.2: Allocation of teacher sample in France 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public school — Non priority 9 0 6 O 0 3 


education - Less than 10,000 
inhabitants - Large digita 
equipment 


Public school — Non priority 24 0 21 O O 3 
education - Less than 10,000 
inhabitants - Normal digital 
equipment 


Public school - Non priority 8 0 8 0 O O 
education - 10,000 to 200,000 
inhabitants - Large digita 
equipment 

Public school - Non priority 18 0 15 0 O 3 
education - 10,000 to 200,000 
inhabitants - Normal digital 
equipment 


Public school — Non priority 12 1) 8 O O 4 
education - More than 200,000 
inhabitants - Large digita 
equipment 


Public school — Non priority 19 0 14 O O 5 
education - More than 200,000 
inhabitants - Normal digital 
equipment 


Public school - Priority education 4 0 3 O O 1 
— Less than 10,000 inhabitants - 
Large digital equipment 


Public school - Priority education 4 O 3 O O 1 
— Less than 10,000 inhabitants - 
ormal digital equipment 


Public school — Priority education 4 0 2 O O 2 
— 10,000 to 200,000 inhabitants 
— Large digital equipment 


APPENDICES 


Table B.4.2: Allocation of teacher sample in France (contd.) 


245 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public school - Priority education 6 O 4 O 0 2 

— 10,000 to 200,000 inhabitants 

- Normal digital equipment 

Public school - Priority education PB} O 4 0 O 1 

ore than 200,000 inhabitants 

- Large digital equipment 

Public school - Priority education 8 O 3 0 O 5 

-— More than 200,000 inhabitants 

- Normal digital equipmen 

Private school - Less than 10,000 4 O 4 O 0 0) 

inhabitants - Large digital 

equipme 

Private school - Less than 10,000 6 0 5 0 O 4 

nhabitants - Normal digital 

equipme 

Private school - 10,000 to 4 O 4 O O 0 

200,000 inhabitants - Large 

digital equipmen 

Private school - 10,000 to 8 O 7 O 0 i 

200,000 inhabitants - normal 

digital equipmen 

Private schoo ore than 4 O 3 O 0) 1 

200,000 inhabitants - Large 

digital equipmen 

Private schoo ore than 9 0 8 0 0 1 

200,000 inhabitants - Normal 

digital equipmen 

Total 156 6) 122 0) 0 34 

Note: Thirty-four schools were regarded as non-participating because the within-school participation rate was below 50%. 


246 


Table B.5.1: Allocation of st 


B.5 Germany 


e School 
three students 
ts, Funct 


e Explici 


e Implicit stratification for regular schools was applied by soc 
and federal state (16 levels). For special education schoo 


federal states (16 


e School sample ove 
Standards 2018 ( 
control to both surveys. 


studen 


t stratifica 
Westphalia, ot 
educati 


tion for regular schools was perform 
her federal states) and school type (Gymnasium, non-Gymnasium). Special 
onschools with students able to do the test were p 
in five explicit strata. 


e The sample was disproportionally allocated to explicit strata. 


ICILS 2018 TECHNICAL REPORT 


evel exclusions consisted of special education schools and very small schools (less than 
inthe target grade). Within-school exclusions consisted of intellectually disabled 
ionally disabled students, and non-native language speakers. 


ed by federal state (North Rhine- 


aced ina separate stratum, resulting 


ioeconomic status predictor (3 levels) 
s, implicit stratification was done by 
evels), giving a total of 65 implicit strata. 
rlap between ICILS 2018, PISA 2018, and national Assessment 
Bildungstrend 2018): ICILS sample was selected using minimum overlap 


Educational 


e Schools in North Rhine-Westphalia were oversampled due to the benchmark characteristic. 


e Small schools were selected with equal probabilities. 


dent sample in Germany 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

North Rhine-Westphalia - 42 O 38 4 O O 0 

Gymnasium 

North Rhine-Westphalia - 72 3 65 O 3 O 

Non-Gymnasium 

Other federal states - 44 O ov 2 4 O 

Gymnasium 

Other federal states - 72 0 51 4 3 13 1 

Non-Gymnasium 

Special education schools 4 1 2 0 O 0 

- None 

Total 234 4 193 11 5 20 1 

Note: Six schools were regarded as non-participating because the within-school participation rate was below 50%. 


APPENDICES 247 


Table B.5.2: Allocation of teacher sample in Germany 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

North Rhine-Westphalia - 42 O 36 4 0 2 0 

Gymnasium 

North Rhine-Westphalia - 72 3 65 4. O 3 O 

Non-Gymnasium 

Other federal states - 44 O 25 1 1 17 0 

Gymnasium 

Other federal states - 72 O At 4 2 24 1 

Non-Gymnasium 

Special education schools 4 1 2 0 0 1 0 

- None 

Total 234 4 169 10 3 47 1 


Note: Thirty-three schools were regarded as non-participating because the within-school participation rate was below 50%. 


B.6 Italy 


e School level exclusions consisted of schools for children with special needs, schools with less 
than six students in the target grade, schools with Slovenian instruction language, and schools 
in remote areas or on little islands. Within-school exclusions consisted of functionally disabled 
students. 


e Explicit stratification was performed by geographic region (North, Central, South). 


e |mplicit stratification was applied by school type (public, private) and performance group (5), 
giving a total of 30 implicit strata. 


e Small schools were selected with equal probabilities. 


Table B.6.1: Allocation of student sample in Italy 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

North 66 O 66 O O O 

Central 28 O 24 4 O O 

South 56 O 53 3 O O 

Total 150 0) 143 7 0) O 

Note: No schools with student participation rate below 50% were found. 
Table B.6.2: Allocation of teacher sample in Italy 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

North 66 O 66 O O O 

Central 28 0 24 4 0 0 

South 56 O 51 3 O 2 

Total 150 0 141 7 0) 2 


Note: Two schools were regarded as non-participating because the within-school participation rate was below 50%. 


248 ICILS 2018 TECHNICAL REPORT 


B.7 Kazakhstan 


e School level exclusions consisted of schools for children with special needs, schools with less 
than five students in the target grade, Uighur schools, Uzbek schools, Tadjik schools, and other 
anguage schools. Within-school exclusions consisted of intellectually disabled students, 
functionally disabled students, and non-native language speakers. 


e Explicit stratification was performed by urbanization (urban, rural) and language of instruction 
4), resulting in eight explicit strata. 


e Noimplicit stratification was applied. 
e The sample was disproportionally allocated to explicit strata. 
e Schools were oversampled to allow for better estimates for the different language groups. 


e Small schools were selected with equal probabilities. 


Table B.7.1: Allocation of student sample in Kazakhstan 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Kazakh only 30 O 30 0 O 0 

Urban - Russian only 20 O 20 0 O O 

Urban - Kazakh & Russian 30 0 30 O 0 O 

Urban - Other 14 1 13 0 0 O 

Rural - Kazakh only 36 O 36 0 O 0 

Rural — Russian only 10 0 10 0 O O 

Rural - Kazakh & Russian 30 0) 29 O 0 al 

Rural - Other 16 1 15 0 0 O 

Total 186 2 183 0 0) 1 

Note: One school with student participation rate below 50% was found. 


Table B.7.2: Allocation of teacher sample in Kazakhstan 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Kazakh only 30 ie) 30 O 0 O 

Urban - Russian only 20 0 20 O O O 

Urban - Kazakh & Russian 30 0 30 O 0 O 

Urban - Other 14 i). 13 O 0 O 

Rural - Kazakh only 36 0 36 O O O 

Rural — Russian only 10 0 10 O O O 

Rural - Kazakh & Russian 30 O 30 0 0 O 

Rural - Other 16 il 15 O O O 

Total 186 2 184 0) (6) (6) 

Note: No schools with teacher participation rate below 50% were found. 


APPENDICES 


B.8 Korea, Republic of 


e School level exclusio 
five students in the 
schools). Within-sch 


disabled students, and n 


tification wa 
mixed), resu 


tstra 
girls, 


Explici 
boys, 


o implicit stratification 


The sample was disprop 


on-native la 


s performed 


ortionally al 


was applied. 


ns consisted of geographically inaccessible schools, schools with less than 
target grade, and schools with different curriculum (physical education 
ool exclusions consisted of intellectually disabled students, functionally 


nguage speakers. 


by urbanization (urban, suburban, rural) and school type 


ting in nine explicit strata. 


ocated to explicit strata. 


Schools were oversamp 


ed to allow 


for better estimates for rural, boys, and girls schools. 


Small schools were selected with equal probabilities. 


Table B.8.1: Allocation of student sample in Korea 


249 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Boys 10 0 10 0 O O 

Urban - Girls 10 O 10 O O 0 

Urban - Mixed 34 0 34 0 0 0 

Suburban - Boys 12 O 12 0 O O 

Suburban - Girls 12 0 112 0 O 0 

Suburban - Mixed 38 O 38 0 O O 

Rural - Boys O 8 0 O O 

Rural - Girls 0 8 O O 0 

Rural - Mixed 18 O 18 O O O 

Total 150 0 150 O 0 6) 

Note: No schools with student participation rate below 50% were found. 
Table B.8.2: Allocation of teacher sample in Korea 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Boys 10 0 10 O O O 

Urban - Girls 10 O 10 O O O 

Urban - Mixed 34 O 34 O O O 

Suburban - Boys 12 O 11 O 0 i 

Suburban - Girls 12 O 12 0 0 0 

Suburban - Mixed 38 O 36 0 O 2 

Rural - Boys O O 0 O 

Rural - Girls 8 O 8 O O 0 

Rural - Mixed 18 O 18 O O O 

Total 150 0) 147 6) 0 3 

Note: No schools with teacher participation rate below 50% were found. 


250 


ICILS 2018 TECHNICAL REPORT 


B.9 Luxembourg 


e Noschool-level exclusions. Within-school exclusions consisted of intellectually disabled students, 
functionally disabled students, and non-native language speakers. 


e Explicit stratification was performed by curriculum (school following national curriculum, schools 
following different curriculum), resulting in two explicit strata. 


e Noimplicit stratification was applied. 


e Census of all schools and students. All variance estimates were computed using schools as 
variance strata. 


Table B.9.1: Allocation of student sample in Luxembourg 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Schools following national 33 0 32 0 O ul 

curriculum 

Schools with different curriculum 8 0 6 O 0 2 

Total 41 0 38 (0) 0 3 

Note: No schools with student participation rate below 50% were found. 
Table B.9.2: Allocation of teacher sample in Luxembourg 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Schools following national 33 O 23 0 O 10 

curriculum 

Schools with different curriculum 8 0 5 0 0 3 

Total 41 0 28 0 ) 13 

Note: Ten schools were regarded a non-participating because the within-school participation rate was below 50%. 


B.10 Portugal 


e School-level exclusions consisted of schools with less than seven students in the target 
grade, and international schools. Within-school exclusions consisted of intellectually disabled 
students, functionally disabled students, and non-native language speakers. 


e Explicit stratification was implemented by school type (public, private) and subregions (23), 
resulting in 28 explicit strata. 


e Noimplicit stratification was applied. 
e Small school samples within regions necessitated disproportional sample allocations. 


e Small schools were selected with equal probabilities. 


APPE 


NDICES 


Table B.10.1: Allocation of student sample in Portugal 


251 


School participation status—Student survey 


Explicit strata 


Total 
sampled 
schools 


Ineligible 


Participating schools 


schools Sampled 


First 
replacement 


Second 
replacement 


Non-participating 
schools 


Pub 


ic - Alto Minho 


6 


fe) 


0) 


Pub 


ic - Cavado 


Pub 


ic - Ave 


Pub 


Porto 


ic - Area Metropolitana do 


18 


OO) O)/'O 


17 


BEl/O;O;oO 


Pub 


ic - Alto Tamega 


Pub 


ic - Tamega e Sousa 


Pub 


ic - Douro 


Pub 


ic - Terras de Tras-os-Montes 


Pub 


ic - Oeste 


Pub 


ic - Regiao de Aveiro 


ic - Regiao de Coimbra 


ic — Regiao de Leiria 


ic - Viseu Dao Lafées 


ic - Beira Baixa 


ic - Médio Tejo 


ic - Beiras e Serra da Estrela 


BIBI RAI AIA AIA ALAA) Aso 


AIO] ala; m] ai al;yna;ra;lasya 


Lisboa 


ic - Area Metropolitana de 


No 
© 


OlSOlO;SO/OJSOJ]O;LO};O};]O;O}O/O 


NO 
wo 


OLPITRPL,OLO;O]RPI/O;}O;O;O;}O/O 


| | Se | | oh | | SS: | OOS 


NT O;O;O;O;FI|OLO;R|O;O;}O/;O 


ic - Alentejo Litoral 


ic - Baixo Alentejo 


ic - Leziria do Tejo 


ic - Alto Alentejo 


ic - Alentejo Central 


ic - Algarve 


Pub 


ic - Regiao Autonoma dos 


Acores 


DIAL AILAIAI Ao 


C/O|lO/C}O}/oO/;o 
NO] af~AR,aflalajsa 


FIO;O;O;}O;O;O 


| 2" | 7 2 | OS) S| 


WIRINM}|O;]O;}O]N 


Pub 


ad 


ic - Regiao Autonoma da 
eira 


Port 


Private - Area 


etropolitana do 
fe) 


Private - Area 
Lisboa 


etropolitana de 


Private - Other Regions 


22 


1 15 


4 


Total 


220 


1 189 


11 


19 


Note 


: Seventee 


schools were regarded as non- 


participating because the withi 


n-school participation rate was be 


ow 50%. 


252 ICILS 2018 TECHNICAL REPORT 


Table B.10.2: Allocation of teacher sample in Portugal 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public - Alto Minho 6 O 6 O 0 O 

Public - Cavaco 6 0 O 0 O 

Public - Ave 6 0 6 O 0 O 

Public - Area Metropolitana do 18 0 17 0 0 

Porto 

Public - Alto Tamega 6 0 6 O 0 O 

Public - Tamega e Sousa 6 0 i) O O 1 

Public - Douro 6 0 6 O 0 0 

Public - Terras de Tras-os-Montes 6 0 6 O 0 0 

Public - Oeste 6 0 =) O 0 1 

Public - Regido de Aveiro 6 0 6 O O O 

Public - Regido de Coimbra 6 O 5 1 0 O 

Public — Regido de Leiria 6 0 6 0 0 O 

Public - Viseu Dao Lafoes 6 0 6 O 0 O 

Public - Beira Baixa 6 0 6 O 0 0 

Public - Médio Tejo 6 0 5 1 0 O 

Public - Beiras e Serra da Estrela 6 O 5 1 0 O 

Public - Area Metropolitana de 28 0 26 O O 2 

Lisboa 

Public — Alentejo Litoral 6 0 5 O 0 1 

Public - Baixo Alentejo 6 0 6 0 O O 

Public - Leziria do Tejo 6 0 6 0 0 O 

Public - Alto Alentejo 6 0 6 0 0 O 

Public —- Alentejo Central 6 0 6 O O O 

Public - Algarve 6 0 6 O 0 O 

Public - Regido Aut6énoma dos 6 0 4 4 0 i 

Acores 

Public - Regiao Aut6énoma da 6 0 6 0 0 O 

Madeira 

Private - Area Metropolitana do 7 0 5 2 0 O 

Porto 

Private - Area Metropolitana de 7 0 6 a 0 O 

Lisboa 

Private - Other Regions 22 1 13 4 0 4 

Total 220 1 197 11 6) 11 

Note: Seven schools were regarded as non-participating because the within-school participation rate was below 50%. 


APPENDICES 253 


B.11 United States 


e School-level exclusions consisted of schools with less than two classes in the target grade 
and private schools. Within-school exclusions consisted of intellectually disabled students, 
functionally disabled students, and non-native language speakers. 


e Explicit stratification was performed by poverty level (2), school type (public, private), and 
geographic regions (Northeast, Midwest, South, West), resulting in 12 explicit strata. 


e |mplicit stratification was applied by school location (city, rural, suburban, town), and ethnicity 
status, giving a total of 96 implicit strata. 


e Small schools were selected with equal probabilities. 


Table B.11.1: Allocation of student sample in the United States 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

igh poverty — Public - 17 O 6 1 0 10 O 

ortheast 

igh poverty — Public - 25 il 18 0 0 6 O 

idwest 

igh poverty — Public - 68 1 58 3 2 4 0 
Sout 

igh poverty — Public - 42 1 35 4 0 2 O 

Wes 

Low poverty — Private - 7 O 1 2 O 4 O 

ortheast 

Low poverty -- Private - 7 0 4 1 O 2 0 

idwest 

Low poverty — Private - 10 1. 5 O 0 4 0 

Sout 

Low poverty - Private - 6 1 i O 1 3 0 

Wes 

Low poverty — Public - 34 O 12 2 1 19 0 

ortheast 

Low poverty — Public - 43 1 24 ss) 2 11 O 

idwest 

Low poverty — Public - 56 4. 40 5 dl 8 HE 

Sout 

Low poverty — Public - 37 i. 27 2 0 6 iE 

Wes 

Total 352 8 231 25. 7 79 2 

Note: Three schools were regarded as non-participating because the within-school participation rate was below 50%. 


254 


ICILS 2018 TECHNICAL REPORT 


Table B.11.2: Allocation of teacher sample in the United States 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

igh poverty — Public - 17 0 5 1 O 11 O 

ortheast 

igh poverty — Public - 25 al 17 O 0 fs O 

idwest 

igh poverty — Public - 68 1 57 3 2 5 O 
South 

igh poverty — Public - 42 1 33 4 0 4 O 

Wes 

Low poverty - Private - 7 O A 2 1 3 O 

ortheas 

Low poverty -- Private - 7 0 4 1 O 2 O 

idwest 
ow poverty - Private - 10 1 5 O 0 4 O 

South 

Low poverty - Private - 6 1 1 O 1 3 O 

Wes 

Low poverty — Public - 34 0 12 2 i 19 O 

ortheas 

Low poverty — Public - 43 i 25 5 2 10 O 

idwest 
ow poverty - Public - 56 1 4O 5 1 8 1 

South 

Low poverty — Public - 37 1 26 2 O 7 1 

Wes 

Total 352 8 226 25 8 83 2 


Note: Seven schools were regarded as non-participating because the within-school participation rate was below 50%. 


B.12 Uruguay 


School-level exclusions consis 
Within-school exclusions cons 


nterior Urban) 


explicit strata. 
o implicit stra 
The 
fo 
wel 


sample was 
better estim 
as Lyceum 


To enable overl 
pro 


Explicit stratification was per 


, and school ty 


ted of schools for children with special needs and rural schools. 
isted of functionally disabled students. 


formed by school type (public, private), location (Montevideo, 
pe (high school/Lyceum, vocational school/Utu) resulting in six 


babilities, and 


ap control for P 
in two other st 


Within census strata all variance 


Small schools w 


e 


re selected with equal probabilities. 


tification was applied. 

disproportionally allocated to explicit strata. Schools were oversampled to allow 
ates for public and private schools, Montevideo and Interior Urban schools, as 
high school) and Utu schools (vocational schools). 


SA 2018, schools in one stratum were selected with equal 
rata the selection probabilities have been capped to 0.5. 


estimates were computed using schools as variance strata. 


APPENDICES 


Table B.12.1: Allocation of student sample in Uruguay 


255 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public - Montevideo - Lyceum 30 1 23 3 O 3 

Public - Montevideo - Utu 27 1 25 0 0 1 

Public - Interior urban - Lyceum 60 1 57 al O 1 

Public - Interior urban - Utu 30 1 28 O 0 1 

Private - Montevideo — Lyceum 18 O 15 3 O 0 

Private - Interior urban - Lyceum 12 1 11 0 O O 

Total 177 5 159 7 0 6 


Note: Six schools were regarded as 


on-participating because the within-school participation ra 


Table B.12.2: Allocation of teacher sample in Uruguay 


e was below 50%. 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public - Montevideo - Lyceum 30 1 18 2 O 9 

Public - Montevideo - Utu 27 1 11 O 0 15 

Public - Interior urban - Lyceum 60 1 42 dl 0 16 

Public - Interior urban - Utu 30 1 24 O 0 

Private - Montevideo - Lyceum 18 O 12 3 O 

Private - Interior urban - Lyceum 12 1 8 0 O 

Total 177 5 115 6 0 51 


Note: Fifty-one schools were regarded as non-participating because the within-school participa 


Benchmarking participants 


B.13 Moscow, Russian Federation 


ion rate was below 50%. 


e School-level exclusions consisted of schools with less than seven students. Within-school 
exclusions consisted of intellectually disabled students, functionally disabled students, and 


non-native language speakers. 


e Explicit stratification was performed by school performance. 


e Implicit stratification was applied by school type (public, private), giving a total of 27 implicit 


strata. 


e In first two explicit strata schools were selected with equal probabilities. 


Small schools were selected with equal probabilities. 


256 


Table B.13.1: Allocation of student and teacher sample in Moscow, Russian Federation 


ICILS 2018 TECHNICAL REPORT 


School participation status—Student and teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Top 1-50 schools 26 0 26 O O 0 

Top 51-100 schools 26 0 26 O 0 0 

Top 101-200 schools 30 O 30 O O O 

Top 201-300 schools 30 O 30 O O O 

Top 301 and above-all other 38 O 36 1 ‘I 0 

schools 

Total 150 0) 148 1 1 O 

Note: No schools with student participation rate below 50% were found. No schools with teacher participation rate below 50% were found. 


B.14 North Rhine-Westphalia, Germany 


School-level exclusions consisted of special education schools and very small schools (less than 


three students in the 


Explicit stratification 
Gymnasium). Specia 
separate stratum, resulting in 


for regu 


Implicit stratification for regu 


target grade). Withi 
students, functionally disabled students, 


levels), giving a total of four im 


School sample overlap between ICILS 2018, PISA 2018, and national Assessment 


lar schoo 


lar schoo 
plicit strata. 


n-school exclusions consisted of intellectually disabled 
and non-native language speakers. 


s was performed by school type (Gymnasium, non- 
education schools with students able to do the test were placed in a 
three explicit strata. 


s was applied by socioeconomic status predictor (3 


Educational 


Standards 2018: ICILS sample was selected using minimum overlap control to both surveys. 


Small schools were selected with equal probabilities. 


Table B.14.1: Allocation of student sample in North Rhine-Westphalia, Germany 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 
Gymnasium 42 O 38 4 0 0 
Non-Gymnasium 72 3 65 1 O 3 
Special education schools al 0 af 0 0 O 
Total 115 3 104 5 (0) 3 
Note: No schools with student participation rate below 50% were found. 
Table B.14.2: Allocation of teacher sample in North Rhine-Westphalia, Germany 
School participation status—Teacher survey 
Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 
Gymnasium 42 0 36 4 O 2 
Non-Gymnasium 72 3 65 1 O 3 
Special education schools il 0 1 O O O 
Total 115 3 102 5 0 5 
Note: Three schools were regarded as non-participating because the within-school participation rate was below 50%. 


257 


APPENDICES 


APPENDIX C 


National items excluded from scaling 


ET < dIG Ajunos :ZE09 
ET < 4IG Asqun09 :F 20H 


ZE09 (LOH 


(uoljOJapa UbISSNY) MOISO/\| 


ET < dIG Ajunos “(ZT 
ST < sig Avqunos :9ZT4y 
eT < dIG Ayunos :gzTY 
ET < dIG Avqunos 79609 
< 4IG Asqunos :7805 
< 4G A4qunos :7905 
[ < 4Iq Asjunos :7Z0g 


(CTY OCTY ‘ATU 
‘9609 ‘Z809 ‘Z909 ‘ZZ0d 


4604 


Jo aIQnday ‘easoy 


T < 4IG Auqunos ‘70s 


DET avOS 


ueysyyezey 


T < AIG Asqunos :790S 
ST < dig Ajunos :470 


co 
S 
ST < 41 Aqunos:9ZT 4 
o 
& 


Z90S ‘3ZOH 


Aley 


%OL > AU!GE!|PY 49pOD JaqU] VZTY 
%OL > AUNGENPY 49POD JOU] “ZEOD 
%OL > AU[!GENSY 49pOD 494U] -4Z0 
%OL > AU!GENAY 49poD 4eqU] -<qZO 
%OL > AYU[IGe!aY JaPOD JAIL] “DLO 
EL < AIG AUNOD ‘DZTY 


YeTe ZE09 
‘ALOH ‘GZOH ‘OZOH ‘OZTY 


AUPUWWI9) 


%OL > AU!Geljey JepOD Jez] ‘GZTe 
%OL > AW!Geljay Jepod 4J2}U] -OZTY 
%OL > AU[!Gel|ayY JEPOD 497U] °V9O" 
%OL > AU[!Geljay JAaPOD 4J93U] °N609 
BOL > Apiqeljay 49poD 424u| :9609 
%OL > AU[!GelPY 48pOD 493U] :ZEOD 

> 

e 


Ayjigel|ay apo 48qu 

480 S %OL > Autiiqeliay Japod JaqU] :4Z0 
OL > AUjigel|ay 4apod 484u| -3Z0 
%OL > AU!GeHaY 4@POD 494U| 70 
%OL > AUIIGEHSY 49P0D 484U] :/0H 
%OL > AUigel|ay 4apoD 494] :gZ0 


GZTY ‘DZTY ‘V90" “D609 
‘D609 ‘ZE09 ‘4805 4ZOH 
4ZOH ‘GZOH ‘DZOH “Z0H 


‘ 


aouel4 


ET < 4IG Anjunos :ZE0S 


ZEOS 


pue|ul4 


ET < 4IG Aqunos :ozTY 


Ocla 


4eEWU9IG 


eT < dIG Aqunos -gzTy 
ET < AIG AUNOD -D608 AZTU ‘D608 


dzTe ‘D609 


a|!4D 


uoIsn|oxe 10} Uoseay 


Suljeds WO papnjaxq 


UO!}a|ap 10} UOseay 


aseqejep ul 
P249}SIUJLUPE JOU, 0} 39S 


Adyunosz 


ICILS 2018 TECHNICAL REPORT 


258 


%OZL > AU[!Gel|ay JEPOD 4J93U] :ZOTY 
%OL > AU|igel|ay JapoD 49}U] :Z8O¥ 
%OL > AdIgel|ay 4aPOD 494U] °9609 
%OL > AU[!Gel|2Y 48P0D 423U| :q609 
%OZL > Apiqel|ay 49poD 4a4U] 79609 
%OL > Ap[igel|ay 4epod 494U| :Z809 
OL > AU|!Gelley JAPOD 42}U] +10 

%OZL > Apiqeljay 4apo>D Ja}U 
%OZL > AUigel|ay 49po>D J9qU 
%OZL > AUigel|ay 49p0D J94u 
%OZL > Apiqeljay 4apoD J9}U 
%OL > Aqjiqeljay 4apod 493U 
%OZL > Appiqel|ay 49poD 494] 7-g/0 
%OL > Ap[IGelOy 

< 41G Aujuno9 :V90y 
%OL > AW!Geloy 
Japoy Ja}u| pue ET < 4|q Ayqunos :7EQD 


Japod Jaj}u| pue ¢° 


ZOTY ‘Z804 ‘D609 ‘609 
‘0609 'Z809 'IZOH ‘HZO 
‘DZ0H '4AZOH ‘GZOH ‘920 


€T < 41 Aqunos :fZ0 ‘Z0H ‘V90" ‘ZE09 ‘TZO Aen8nun) 
%OL > Ajigetay 4apoD 494) :3Z0 

ET < 4IG Aqunos :OzTY 

ET < IG Anunos:Z/0% 

ET < 4IG Amqunos 7705 

€'T < 4IG Anqunos :470 4Z0H ‘DZTY 

ET < 4IG Asqunos :qéog ‘ZL0¥ ‘ZZ09 ‘4Z0H ‘G608 saqeqs payun 
%OL > Ar|!Ge!!2Y 42POD 423U| :DZ0H 
%OL > Arl|!Ge!|ay 4ePOD 494U| -40 
%OZL > Ar|iqellay 490 42 4u] “G20 

ET < IG Aqjunos :7g09 520 

€'T < 41 Aiqunos (20H ‘4Z0H ‘AZ0H ‘Z809 ‘ZO jeSny10q 

ET < 4IG Anunos :OzTy DzTy (AudWad) DOYAISa"\-2UIYy YON 


uolsnjoxa 40} uoseay 


Suljeos Woy papnjoxy 


UO!}a/EP 410 UOSeay 


aseqejep ul 
»P2494S|UILUPe 4OU,, 0} 39S 


Aayuno 


259 


APPENDICES 


APPENDIX D 


BUISSI|A| 


(syooq OOZ UeY} e1OW 
SaSed4OOq SJOW JO 3944} || 0} UZNOUF Ss 


rev) (S40Oq OOZ-TOT) Sased400g Om} ||4 0} YSNOUF “p 
c (Aso8ayed 39UdI9J9I = BUISSI|\\) (S¥OO OOT-9Z) aSed4OOq auO || 0} YSNOUF *S 
= AJO89}€9 9DUdIJ9I S| 98e}UdIIEd (S¥OOd GZ-TT) JOYS SUO ||Y 0} SNOUT "Z 
5} VYOd ysously YM UO!}do “8ulpoo Awwng (SOO OT -O) Maj AJOA JO QUON ‘T PLOZSI awoy ye syoog 
a BUISSI|A| 
= (Aso8a}e9 3DUdIJAJaJ = BUISSI|\\) [asensue| JayOUY] = 7 
Oo Aso8a}e9 adUa4aJaJ S| 88e]UadIad [Z adensue| ay1O] = 
Cc ysaysiy ayy YIM UOldo ‘saluqUNOD [T e8ensue| Joyo] = Z 
oO WOd ssoide SalieA apod AWLuNp JO yysue] [3S0q Jo a8ensueq] = T SODZS| awoy je asensue] 
= BUISSI|A| 
re} (As08a}e9 39UdIJ9J9J = BUISSI|\\) [AJJUNOD JAUJOUY] = 7 
ue a DOUDIJII SI Selman [g dnossy/AsquNod JaUyIO] = € JrODZSI [uelpsens ajew] Jo Jaye - yysig Jo AsquNOD 
Oo ysaysziy ayy YM uolqdo ‘saluyUNOD [V dnoig/Asqunod JaujO] = Z qvoonzs| [ueIpsens ajewWa}] JO JaY,O — YJsIg Jo AsqUNO 
Vod ssojoe SalJeA apoo AWLWUNp Jo yj8ue L a 
Y l P $0 YjsUs)] [3S9} Jo AyjuNOD] = VvOOcs!| NO, — Yylg Jo AUNOD 
BUISSI|A| 
|AA9]| G3ADS|] a9}dwWod 0} adxe JOU Op |G 
—N [Z |9A2| GDS] 7 
a (AsJo8a}e9 39UdI9J9J = BUISSI|\\) [Ee JeA2| GADSII 
Ta) Asosayeo eoletele S| a8ejUaoJad [S JO p |PA2| G4DSI] ‘7 
& Vod ysaysiy YUM UOI}doO ‘8UIPOd AWLWUNG [8 40‘7 ‘9 [@A9] GADSI] 'T ddodsIs 3}98|CWOd OJ UOIJWeINpa JO SjaAa] payDadxa JUapNIS 
_ Tues BUISSI|\| 
= WOd oAdoy anjeA I4SIH uolzedno90 jeyWsJed 40 Jaa] JS9YBIH 
BUISSI|A| 
= [Z |@A2| G4DS|] A49|dwWod 0} adxa JOU Op |G 
[Z A298] GADSI] 7 
5 (Aso8a}e9 39UdI9J9J = BUISSI|\\) [S J@A2] GSDSIIE 
~ ei ane ie S| a8ejUso.4ed [S JO p JPA9| G4DSI]'Z 
oD VOd SaY3IY YIM UO!IdO “BUIPOD AWWUNG [8 10°Z “9 |9A29| GADSI) 7 C4OSIH uol}eINpe jejUaJed JO |2Aa| JSOy3!H 
s tO SUISSI|\| 
00 SlEW 
Ss yoadid Ot a|ewa+ YSGN39 Japua 
T ued BUISSI|A| 
c VOd o’AdoD One 4OV asy 
5) pag anjeA S}1807 NW HOS juawanalyse UedLU JOOYIS paysnipy 
"oO Alogayed ddUaIdJa Se LUNYeI4S ]S9BIe| 
a pag dU} YM wuNyeys Jad ajgeien Aung [x] FIVYLSal QI wnyetys 
YW) Jossoi30y s8ulpo> sanjeA owen ajqee) 


ICILS 2018 TECHNICAL REPORT 


260 


sesodund 
Aep AJaAd JOU JN 489M e BDU jsed} IV G8t9zS 49YJO JOJ |OOYDS aPIs}NE — | D| 8ulsn Uso MOH 
BUISSI|A| sasodund payejas-jooyos 
Aep AlaA3‘S DETOZS JOJ |OOYDS ApIs}NE - [Dd] 8ulsn UdaYyO MOH 
Aep Asad JOU JN Y99M e BDUO 4Sed} JV “7 sasodind 
(AJO39}e9 9DUIIAJ9I = SUISSI|) 89M AJBAP JOU ING YJUOW e SDUO }Sea] P'S gstozs 49YJO JOJ JOOYDS HY — | D] 8ulsN UaYO MOH 
AJO89}e9 9DUdIAJOI S| 98eJUaoIAd Y}UOWW € BDUO UL} SS97 ‘7 sasodind 
VYOd ysoysly YIM UO!do “SuIpoo AWWUNG Jano V8T9ZS Ppa}e|aJ-|OOYDS JO} |OOYDS PY — | D] 8ulsN Ud}JO MOH 
BUISSI|A| 4/T9ZS 4IOMJOU J9}NAWOD e Ul BUDO - BUIYDed | 
SIU} paused] JOABU dALY |G ALT 9ZS }9UJd9}U] 94} UO UOIJEWOJU! BUIPUl - SuUlyded | 
JasAw 4Yysne} | y GLT9ZS SDIIAIP | D| UO SBul}Jas BulsUeYD - BUlyDea| 
(AsO89}e9 BDUSIAJII = BUISSI\) spualiy Ap D/T9ZS suolje}Uasad jeqsIp 8ulyea49/}Ipy - Sulyoedy 
AJOB9}e9 9DUSI9J9I S| 9BejUSIIAd Ajlwes AZ G/T 9ZS SJUSLUNDOP Je4N81P 8ulje949/}IPF - 8ulyoea 
Vod ysaysiy YIM Udo “BuIpod AWWNG ssayoed} AT VLT9ZS J@UJA}U] BY} JAAO BuUlJeES!UNLWWWOD - Bulydeda | 
BUISSI|A| 
SJOW JO SIEdA UdASS °G 
sJeaA UAVS ULL SS9| ING SJedA SAY 4Se9] IV “7 
(AJO89}E9 9DUTIAJOI = SUISSI[\) sdeak aAY UCU} SS9| 3Ng SAedA 3dJY] 4Sed] WV J9ITOZSI sauoydyseus 3uUISn 3U0] MO 
AJOB9}E9 9DUTI9J91 S| 9Be}UDOIAd sdeadd dd4yy UY} SS9| ING 4edA DUO jSed] HV TZ g9TOZSI SJaPedJ-3/S}9|qGe} BUISN 3UO| MOH 
VWod ysaysiy YIM UOI4dO “3uUIPOD AWWNG Jed duo Ue} SS97 "TF VITOZSI s4aynduod doyde|/doyy\sap 8ulsn 83U0| MO 
(AJO39}e9 BDUIIAJII = SUISSI|) BUISSI|A| 
AJO8a}e9 9DUSIIJ9I S| adeUdDINd ON 'Z 
WYOd ysaysiy YM Udo “8uIpod AWWNG SOA T gStoZSI SWOY Je UO!DISUUOD JouayU 
BUISSI|A| 
SJOW JO 9dNY | 
(As089}e9 BDUSIAJOI = BUISSI\) OML'S 
AJO89}e€9 BdUdIAJa S| 98eJUadIEd sud % AVSTOZSI Japeas-3/jajqe| — swoy ye sad|Aaq 
Wd ysaysiy YiM UOIIdoO ‘BuUIPOD AWWNG SUON 'T VVST9ZSI dojde|/doy\saq - awoy je sadiAeq 
Jossaj3ay sSuIpo; sane owen ajqeuen 


261 


APPENDICES 


ajdoad Jayjo 0} SalqAlqoe JO sjUaAa 


(0Z9CS ynoge UO!EWOJU! PAEMIOJ JO PUBS - [D9] BUISA 

duI|UO paysod aney ajdoad 

lOZ9ZS JAYJO JEU} SABE! JO SODPIA YOJEMA - [DI] 8uls¢A) 

([aqn No, 

JO WesJ8e4SU| ‘YOOqaIe 4] °3°9) S9!}!JUNWWOD SUI|UO JO 

HOZODZSI_ | SYIOMIJAU JLIDOS Ul OAPIA JO SABE! JSOg — 1D] BUISN 

([4as8o]g I|QuINL ‘SSatgPJO/A] ‘3'3) 

J0Z97ZS 80|q UMO JNOA JO} $}SOd APIA - LD] 8UISN 

Saqisqam [Jamsue pue uolysanb Wyo] 410 suuNsoy 

4O0Z9ZS uo suolysanb sajdoad Jayjo waMsuy - [9D] SUIS 

Saqisqam [Jamsue pue Uolysanb 

4OZOZS WVP93H] JO SUUNIOJ UO SUOHSANb ysV — [9D] 8UIsSy 

eIPAW [2190S UO a}l| INOA ul suadde 

doz9zs }eEUM yNOGe sajyepdn pue sjsod SpA — | D| BuUISN 

SUISSI|A| 9|doad Jayjzo Jo ‘Ajlwes ‘Spuald 

Aep AlaA5'G 90Z97ZS 0} SABeSSAW JUCJSU! JO $}X9} PUSS - [Dd] BUIS/A) 

Aep Asana JOU JING Y9aM e BDUO 4Ssed] IV “7 (L4aqiA ‘ddyszeuA ‘adAxs] “8°e) JeYD CAPIA 

89M AJaAd JOU ING YJUOW e BDU0 4Sed} JV'E JO SdIOA ‘Bulsessaw JUe}SU! BuUISN ajdoad 4Jay}O 

(Aso89}e9 99Ua19J94 = BUISSI|\\) Y}UOLW € 3DUO UCU} SS97 ‘7 gozozs Jo ‘AjiWe4} ‘SPUAIIJ YUM 3}ed!UNWWWUOD — | D] 8UISN 
AJOB9}E9 9DUdIBJ9I S| 98e}UdDIEd JBARN 'T el Pa Je!90S 
Vod ysaysiy YM UO!}do “Bulpoo AWwng VOZ9ZS UO S}USAP JUSIIND Jnoge sMau a4seYs - [Dd] 8ulsN 
H6T9CS asedqam e pling - [D| 8uls¢q 

D6T9ZS DISNU Ppa JO BdNpoOIjd - [DI] 8ulsn 

[sdde] 10 auemyjos 

46T9TS solydess Jo 8ulquied “BuIMep asp) — | D] 8uIsA 

([yoyes9S JO V7 0807] Bulsn “3’a) sdde 

F6T9ZS JO S4dI 49S ‘sWesSO1d J9INAWOD SYA — [D] 8uUISN 

C6L9ZS SOAPIA JIPe JO Psoday - [Dd] 8uUIsA 

BUISSI|A| ([@ JUIOgJAMOd YOSOID!|[\] Sulsn “3"a) UOIZeJUasaId 

Aep Alaa5'G J6T9ZS ,MOYUsapl|s, ajdwis e ayeauD — 1D] sulsy 

Aep Asana JOU YN Y8aM e BDUO jsed] IV ([@ |20xg 
(Aso39}e9 9DUaIAJ9J = BUISSI|\\) eam AJBAP JOU 4NG YJUOW e SdUO jsed] IVE JYOSOIDI |] Sulsn “3°a) sydess Jojd JO eyep 9404s 
AJOB9}ED 9DUSIIJ9I S| 9Be}UdDIAd Y}UOLW € 9DUO UCU} SS97 ‘7 A6TL9ZS ‘suoljejnajed Op 0} yaayspeaids e asp) - [Dd] SUISA 
Vod ysaysiy YUM UOI}dO ‘8UIPOd AWWUNG JOABN T V6T9CS SJUDWNDOP Pd JO d}IAA — | D] Susy 
Jossaizoy suIpoZ sanjeA oWeN ajqelen 


ICILS 2018 TECHNICAL REPORT 


262 


([JUaWUdOjaAap jeln0s pue jeuOsiad ‘Uuo!ZeoNpe 


IETOSS Jeo1sAyd ‘soIY}a/|e40UU] ‘3'9) JAYIO — asn 1D 
[sajduexe jeuoijeu ayelidoidde 
HETZOTS Aue pp] |2UO!eIOA JO |ed149e1q - 9SN | D 
[4e|IWIS JO Salpnys 
DJEZNCS Jaynduwod ‘A8OjOUYIE} UO!JEWIOJU]] — ASN | D 
(‘09 “eweup 
4ETZOTS ‘gouep ‘Isnt ‘sje [JensiA]) sje sAed4D - asn [Dd 
FETOZSI_ | (‘Je ‘solWUOUIa ‘Me ‘SOIAID ‘Aydess80a3 ‘AJO4SIY) SaIPNYs 
JEID0S / SaIJIUEWINH/ Seoualos UeWNH - asn 1D 
GEZDZS (saoualas yea ‘A8ojoas ‘A8o0jOIg ‘A1}SIWaYo 
BUISSI[ ‘soisAyd JO/pue aoualds |eJauUas) S9DUAaIDS - asN {5D 
UOSSa] AJOAd JSOWIe JO ADAP U| ‘SG JETZOZS [sajduiexa jeuolye 
UOSSd| JSOW U| 7 ayelidosudde Aue pp] soewayjzey - asn [> 
(AJo8a}e9 BDUdJAJOJ = SUISSI|\) SUOSSA9] SLUOS U] *S [sasensue| |euoleu 
AJOB9}C9 9DUSIIJ9I S| 98e}UIDIAd JAIN "TZ gezozS Jayjo pue USIdIO} :sque aB8ensueq] -asn [> 
WOd }saysiy YIM UOo!Ido ‘3uIPOS AWUUNG spalqns asauy / ya!qns siyj ApNys },UOp | “T VETOCS [asensue| 4S9q :sjue asensueq] - asn [9D 
(ZZOTCS suolpnpoid olpne JO OaplA aye — BSN 1D 
([yoqe49S] “3"a) sjuawusisse 
IZZDZS 3}98/CWOD 0} ueMIJOS BUIPOD asf) - asN | D 
HZZ9CS Yoseasal Op 0} JaUJaqU] BY} as - sn 1D 
(aseMyOs SulUJea| a8ensue 
‘QUEMIJOS BUIIOLN S9IFeEWayY}eW *3"a) Ja[qns e 10 
DNZZOZSI_| Sips usea] 07 suoieoijdde Jo asemyos as —asn [9D 
BUISSI|A| AZZOCS sjsaq axe] - sn |D 
Aep Alaa 4ZZ9DZS JOM pue aul7 INCA asIuesIE - ASN | D 
Aep AJaAd JOU NG Y9IM e 9DUO jSe9| IV “v dzz9zs S9S1DIJAXA JO [SJBDYSHIOM] B}a|dWOD - asn | 
(AJo89}e9 SDUDIJAJSI4 = SUISSI|\\) aaM AOA JOU ING YJUOW e 9DUO 4Se9] WV C DE7ZONCZS SJUDPNJS J9YIO YUM AUIJUO JOA - 9SN [D 
AJOBdIED DDUIIOJ9I S| 9Be}UdIIOd YJUOW € 9DUO UY} SS9q ‘Z dZZ9CS suoljejuasaid auedaig -asn [9D 
VYod ysaysiy YIM UOdo ‘suIpOS AWUuNG JBAERN 'T VZZ9CS sAessa JO SJoda4 suedaig - 3sn [9D 
SdIAOLU JO SMOYS 
HUCOCS (| paws Jo papeojumop Yyaje\ - 1D] SuIsq 
DTZOZS|_| disnuw pawiestys so papeojumop 0} uaysq - [Dd] 8ulsq 
4TZ9ZS sowes saAed-a|suls Ae|q - 1D] 8uls¢y 
SUIYJAWOS OP 0} MOY JNO Puy Oo 
4LZOZS SOSPIA SUIJUO JO ‘SLUNIOJ ‘Sd}ISGaM asf) — | D] SuIsN 
SUISSI|\ Ul paysasaqu! dye NOA ss8ulyy 
Aep Asana JOU 4ng YaaM e BDU sea] HY “y dtz9¢s ynoge UO!}EWJOJU! BU!|UO JO} YDJeas - |] SuIsy 
Aep Alaa’ ItZ9ZS JQUIJA}U] BY} UO SJO}S SMAU Peay — 1D] SUIS 
Aep Asad JOU NG Y99M e DUO 3Sed| JV “7 Ang 0} quem 4ysiww NOA 
(AJo8a}e9 9DUdIAJ9I = BUISSI|\) yaaM AJaAP JOU JING YJUOW e DUO 4Seg] PVE glz9zs SB8UIU} JO JOUIDJU] BY} UO SMAIASI peay — | D] BuIsn 
AsOBa}ed 9DUIIOJ9I S| BBe}UdIIEd YJUOWW e BDUO UL} SS9] 7 OP 0} SAIJAI}De JO O8 0} Sade|d ynoge 
WYOd ysaysiy YUM UO!}dO “BuIpood AWUWNG JdABN T VETOZS|_ | UONeWIOJU! PUY OF JOUIAJU] BY} YDIeas - | 9D] 8uls¢y 
JOssaiZ9ay s8ulpoD san|ed owen aiqewen 


263 


APPENDICES 


HSC9OCS SJAYIO YW 3}eJOGe||OI 0} | D] ASA - SulUses| |OOYDS 
d1do} seljlwejun Ue JNOge J@UI9}U] SY} UO UO!EWIOJU! 
9SZ9ZS JO} YOO] 0} B48YM apldaq — B8UlLIed] |OOYIS 
S9DINOS JAUI9}U| WOI} 
4ASZOTZS poulejqo UO!EWOJU! 9ZIUeBIO — SUILIEd| |OOYIS 
JOM |OOYIS 
Ul APNIIU! OF JURA] S! JOUI9}U] BY} WO Pauleyqo 
4SZ97ZS UO!JEWIOJU! JEYUM apl9aq — BUILId| |OOYDS 
JaUsda}U] 9YJ WO UOIJEWIOJUI 
dSZ9zS }SNIY 0} JBYJOYM 4NO AIO — SUIS] |OOYDS 
SUISSI|A| {5] 8ulsn asodind Jo aduaipne 
|2 32 ION “7 9SZ97ZS UDAIS & JOJ UOIJELUJOJU! JUBSIIg — BU|UIEd| |OOYDS 
(AJo8a}e9 9DUdIAJ9I = SUISSI|\\) juayxe [jews e OL “S gSz9zS [LD] Bulsn Uol}eWJOJU! JOJ YDIeas - BulWes| |OOYIS 
AJO89}e9 9DUdIAJoI S| aBeJUaDIAd JUS]XO 9]CJAPOW e O| *Z Sa01NOS 
Vod ysaysiy YIM UO!}do “BuIpod AWWNG quayxe a84e] e OL 'T VSZ9ZS JQUI9JU] O} S90USIBJOI SPIAOIg — BUIUIed] |OOYDS 
WPZOTZS SJEMIOS BUIMeIP JO 8ulydesd — asn jOoL 
(suoljeoi|dde 10 sowie BulLe9| *3'3) 
(bE9OCS S8dINOSII BUILD] JE}SIP SAljDesaJU| - 9SN JOO] 
(elpaedojaAaua ‘SIyIM ‘SaqISqam *3'3) 
IPEDS S2DINOSIJ UO!JEWUIOJU! PASeq-sa]NdWOD - asn joo, 
HvZO7ZS SIEMIJOS BUl||aPOLU PU SUOIJE|NUWIS — BSN JOO] 
sisAjeue JO4 Ajje}8Ip (ainzesaduay ‘paads *3'a) 
vZ9NZS e}ep p|JOM-|eaJ ainjded yey} sjOo| — asn joo 
([@ Uo!jesIdsqa,y] ‘[@ uoesdsuj] *3°a) 
4797S asemyos 8ulddew ydaou0Z - asn jooL 
(UOONpod Gam ‘Bul}Ipa pue ainydeo 
4vZ9TS e|Pa *8°9) S}OO} UO!JONPOId e1IpawiyjNy — asN JOO 
OvzZ97ZS ([@|@9xXJ YOSosd\|\\] “8°9) Saayspeaids - asn jooL 
BUISSI|A| ([@ JUIOGIAMOd JJOSOIDI|A\] *8°9) 
UOssa| Adda JSOWIe JO AJaA2 U| “7 W?Z9ZS SIEMIJOS UO!IJEJUSSIIg — aSN [OO], 
(AJ089}e9 3DUSIJA4$94 = BUISSI|A\) SUOSS9] JSOW US ([® PION YOSOsII\] “3‘a) 
AJOB9}ed DDUSIOJ9I S| 9Be}UIDINd SUOSS9| DUWOS U| *Z qvZ9CS DIEMYOS BUISSADOIC-P4O/AA - SN |OOL 
Vod ysaysiy YyIM UOIqdo “3uIpOo AWWNG JBABN 'T VeoO7ZS [Suuess0sd 99149e1d] JO JeMIJOS |e1JOJN| - aSN [OO 
Jossaizay sulpoD sanjed owen ajqee) 


ICILS 2018 TECHNICAL REPORT 


264 


yausJa}U] 943 UO Puy NOA 


WZZ9ZS|_ | UOHeUWIOJU! JSN4} UeD NOA Jay}EYM a8pN¢ - [Jam oq 

1LZZ9ZS [dde] 10 we301d e |Je}SuU| — ||aMoq 

asessow 

WZZ9OTS JO JUSWNDOP e OU! 9SeUW! UL JdSU| - ||aM oq 

a|youd 

(ZZ9DTS|_ | Suljuo Ue OF CAPIA JO ‘sasew! ‘1x9 peojdy — ||aMoqg 

(OAPIA JO ‘Sainqoid ‘puNOS YIM) 

\ZZ9ZS uoljeJUasaId eIpaw-!qjNuw e ayea4D — |j|aMoq 

15] 49440 JO 

HZZ9OZS SAIINAWIOD JO HJOMJaU ete |ed0] € dN jas - ||amMoq 

([piseg jensi ‘1ISseq] ul “3’a) [dde] 

9/Z9CS JO ‘OsDeW ‘Wet3O1d J9}NdWOD € 3}e9I13D - [JAMO 

saqesado y1 Aem ay} aAosdu! 

4/Z9TZS 0} 9DIAIP JNOA UO SBUIIJAS BY} B8UeYD — ||aMoq 

ALCOCS adedqam e pa JO pling — ||eMoq 

Jeusa}U] By} UO Palo d JoOYds e JO} 

GLZ9ZS UO!JEWIOJU! JUEADJ94 PUY PUe JOJ YDIeSS - ||aMoq 

BUISSI|A| DLTZOTS|_ | JU@WUBISSE JOOYIS & JOJ 4X9} UIP2 JO 9} - ||BMOG 

SIYJ OP P|NOD | JUIY} JOU OP | °E ([@ ss9d9y 1JOSO9!|\] SuIsn “3'3) 

(Aso89}e9 99Ua19J94 = BUISSI|\\) SIU} OP OF glZZ9ZS aseqejyep e ajeaiD - |JaMoq 
AJOB9}ED 9DUSIBJ9I S| BBe}UBOIEd MOY JNO YOM PjNOd | 3Ng Siy}Z SUOP JAABU SAY | °Z saseul 
VYOd 4SaYysiy YIM UO!ydO ‘3uIpod AWWING SIY} OP OF MOY MOU | “T VLTOCSI_ | WYdess JayIO JO sydessojoyd [eys!p py - |jam oq 
A\qisuodsau elpaw 

9792S JEINOS UO UO!EWOJU! BAEYS O] — 8UlUIed] |OOYIS 

UOISSaS e JO Pua ay} 3e 

9792S Jaynduwoo paseys e& JO NO BO] OJ - SulUed| |OOYIS 

sjuawyoe}ye SulUddo dJOjaq 

(As039}e9 99Ua1aJ9J = BUISSI|\) BUISSI|\| g9zZ9ZS S]IELUS JO UIBIIO BY} YDEYD O| — BUIUea| |OOYIS 
AJOB9}C9 99UdJOJ94 S| aBe}UdIIEd ON 'Z (E]pauu [2190S ‘|!eLUa “UNODIE }JOMJOU *3'a) 
WYOod ysaysiy YM Udo ‘3ulpos Aung SAT V9TDSS Ajse|N8a4 spsomssed asueyp O| - 8uUlUsed| |OOYDS 
Jossaizay suIpoZ sanjed owen ajqeue, 


265 


APPENDICES 


(Aso8aj}ed 3DUSIJAJO4 = BUISSI|\\) BUISSI|A| dead |ooYyosS 
AJOBa}e9 BDUdIAJ9I S| 9Be}USIIEd ON 'Z JUIIINI DUA UI [JEIILUIS JO SIFEWIOJU! AZO|OUYIE4 
Wod ysaysiy YM UO! do “SuIpoos AWWNG SoA 'T O€9ZS UO!JEWIOJU! “BDUI!DS JaJNdwod “sulyndwod] ApN4S 
SW3]GOJd 0} SUOIINJOS dSIAJI PUe MaIAII 
I6ZDZS 0} ep pj4Om-jea4 aSn OL — JOOYIS Je SySe} BUIPOD 
Wa|qod e aAjos pue pueyssJapun oO} ejep 
H6Z97CS ayenjera pue p4sodaJ O| — |OOYDS Je SyxSe} SUIPOD 
ssad0id e Jo Sued JUaJajJIP 94} MOS 
9679S 0} SWeISEIP MO|J ayeW O] — JOOYIS je SySe} BUIPOD 
SUU3|GOjd Pj4OM |edJ aAjOS JO pueyssapun 
4A6TOTS djay 0} Suo!e|NWIS Asn OL — JOOYIS je Sy4Se} BUIPOD 
SWa|qoid a/jos djay yey} Swesiselp 
46Z9ZS SEW OF S|OO} BSN O| — JOOYDS Je Sse} BUIPOD 
WAI 2}98|dWODd 0} papaau sdajs ay} NO 
G6Z9ZS sulqas Aq syse} uel O| — JOOYDS Je SySe} BUIDOD 
SUU3|GOJd pjJOM-jed1 MOUS JO aqi4osap yey} 
BUISSI|A| 96Z97ZS suue8elp pueyssapuN O| — JOOYIS je Sy4Se} BUIPOD 
9a48eSIp A|BUOI1S “y sqsed Ja}|elus OJU! SSa0Id 
(Aso8aj}ed 3DUSIJAJO4 = BUISSI|\\) daisesiq “¢ 96Z9CS Xa|dWOD e yeIIg O| - JOYS Je SxSe} BSUIDOD 
AJOB9}e9 9DUSIIJ94 S| 9Be}USIEd da1BVy ‘7 SACM JUdIAJJIP 
VYOd yseysiy YM Ud!do “BuIpoo AWWNGg 9343e AjB8U0IS TF V6Z9ZS Ul UOIPEUOJU! Ae|dsip O| — JOOS Je Sse} BUIPOD 
“Ul P24SIIO4U! LUE | ¥JOM 9} OP O} 
M8Z9DZS aW djay [JIM suoeoydde 1D) asn 0} Moy BulUed] 
(8C9CS ‘LD] paouenpe sanjonu! yey} of e puy. 0} adoy 
‘[jooyos Ajepuodas 
1897S Jaye [5] 0} paze|as syoalqns Apnjs 0} 9y!| pinom 
H8Z9ZS “uyeay S,ajdoad 404 snouasuep aq Aew [9] 8UISA 
Y8zZ9ZS “sjyauaq |eloos Auew 8Ulig | >| Ul SsOUeAPY 
A8COCS ‘AYBIDOS 0} B|GeN|eA S! | D 
48Z9ZS “LD| Bulsn aul yonw 00} Jey puads ajdoad 
SUISSI|\| d8c9¢S “sqol Jama aq ||IM 2494} | D] S1OW YIM 
3048esip A[BU0I1S “7 98Z9ZS ‘Aj9190S Ul payejOS! B40 a|doad sayew | D| 8ulsy 
(Aso8a}e9 3DUSIA4S94 = BUISSI|A\) aas8esig ¢ gq8zozs ‘J9}}9q P|JOM dy} pueyssapUN 0} SN Sdjay 1D 
AJOBd}ed DDUTIAJ9I S| 93e}UdIIAd a18Y 7 *SUOI}IPUOD 
VYod ysaysiy YIM UOHdo ‘suIpOo AWWNG dase Aj8U0I1S T V8Z97ZS BUIAI| S,a|doad aAojdu! Ajjensn | D| Ul SaoUeADYY 
Jossoizay sulpoy sanjeA owen aiqewe) 


IEA’s International Computer and Information Literacy Study (ICILS) 2018 investigated 
how well students are prepared for study, work, and life in a digital world. ICILS 2018 
measured international differences in students’ computer and information literacy (CIL): 
their ability to use computers to investigate, create, participate, and communicate at 
home, at school, in the workplace, and in the community. Participating countries had 
an additional option for their students to complete an assessment of computational 
thinking (CT): their ability to recognize aspects of real-world problems appropriate for 
computational formulation, and to evaluate and develop algorithmic solutions to those 
problems, so that the solutions could be operationalized with a computer. 


This technical report follows the publication of several international and regional 
reports that presented the results of ICILS 2018. It provides a comprehensive account 
of the conceptual, methodological, and analytical implementation of the study. It 
includes detailed information on the development of the data-collection instruments 
used, including their translation and translation verification, on sampling design and 
implementation, sampling weights and participation rates, survey operation procedures, 
quality control of data collection, data management and creation of the international 
database, scaling procedures, and analysis of ICILS 2018 data. The technical report 
enables researchers to evaluate published reports and articles based on data from this 
study and, used in conjunction with the ICILS 2018 User Guide for the International 
Database, will provide guidance for their own analyses. 


Ales 


